| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu
 On 05.02.2025 20:00, Oleksii Kurochko wrote:
> On 2/4/25 12:47 PM, Jan Beulich wrote:
>> On 04.02.2025 08:20, Oleksii Kurochko wrote:
>>> +static int __init riscv_isa_parse_string(const char *isa,
>>> +                                         unsigned long *out_bitmap)
>>> +{
>>> +    if ( (isa[0] != 'r') && (isa[1] != 'v') )
>>> +        return -EINVAL;
>>> +
>>> +#if defined(CONFIG_RISCV_32)
>>> +    if ( isa[2] != '3' && isa[3] != '2' )
>>> +        return -EINVAL;
>>> +#elif defined(CONFIG_RISCV_64)
>>> +    if ( isa[2] != '6' && isa[3] != '4' )
>>> +        return -EINVAL;
>>> +#else
>>> +# error "unsupported RISC-V bitness"
>>> +#endif
>>> +
>>> +    isa += 4;
>>> +
>>> +    while ( *isa )
>>> +    {
>>> +        const char *ext = isa++;
>>> +        const char *ext_end = isa;
>>> +        bool ext_err = false;
>>> +
>>> +        switch ( *ext )
>>> +        {
>>> +        case 'x':
>>> +            printk_once("Vendor extensions are ignored in riscv,isa\n");
>>> +            /*
>>> +             * To skip an extension, we find its end.
>>> +             * As multi-letter extensions must be split from other 
>>> multi-letter
>>> +             * extensions with an "_", the end of a multi-letter extension 
>>> will
>>> +             * either be the null character or the "_" at the start of the 
>>> next
>>> +             * multi-letter extension.
>>> +             */
>>> +            for ( ; *isa && *isa != '_'; ++isa )
>>> +                ;
>>> +            ext_err = true;
>>> +            break;
>> I was wondering why ext_end isn't updated here. Looks like that's because
>> ext_err is set to true, and the checking below the switch looks at ext_err
>> first. That's technically fine, but - to me at least - a little confusing.
>> Could setting ext_end to NULL take the role of ext_err? For the code here
>> this would then immediately clarify why ext_end isn't otherwise maintained.
>> (The "err" in the name is also somewhat odd: The log message above says
>> "ignored", and that's what the code below also does. There's nothing
>> error-ish in here; in fact there may be nothing wrong about the specific
>> vendor extension we're ignoring.)
> 
> IIUC, your suggestion is to use instead of "ext_err = true" -> "ext_end = 
> NULL".
Yes.
>>> +        case 's':
>>> +            /*
>>> +             * Workaround for invalid single-letter 's' & 'u' (QEMU):
>>> +             *   Before QEMU 7.1 it was an issue with misa to ISA string
>>> +             *   conversion:
>>> +             
>>> *https://patchwork.kernel.org/project/qemu-devel/patch/dee09d708405075420b29115c1e9e87910b8da55.1648270894.git.research_trasio@xxxxxxxxxxxx/#24792587
>>> +             *   Additional details of the workaround on Linux kernel side:
>>> +             
>>> *https://lore.kernel.org/linux-riscv/ae93358e-e117-b43d-faad-772c529f846c@xxxxxxxxxxxx/#t
>>> +             *
>>> +             * No need to set the bit in riscv_isa as 's' & 'u' are
>>> +             * not valid ISA extensions. It works unless the first
>>> +             * multi-letter extension in the ISA string begins with
>>> +             * "Su" and is not prefixed with an underscore.
>>> +             */
>>> +            if ( ext[-1] != '_' && ext[1] == 'u' )
>>> +            {
>>> +                ++isa;
>>> +                ext_err = true;
>>> +                break;
>>> +            }
>>> +            fallthrough;
>>> +        case 'z':
>>> +            /*
>>> +             * Before attempting to parse the extension itself, we find 
>>> its end.
>>> +             * As multi-letter extensions must be split from other 
>>> multi-letter
>>> +             * extensions with an "_", the end of a multi-letter extension 
>>> will
>>> +             * either be the null character or the "_" at the start of the 
>>> next
>>> +             * multi-letter extension.
>>> +             *
>>> +             * Next, as the extensions version is currently ignored, we
>>> +             * eliminate that portion. This is done by parsing backwards 
>>> from
>>> +             * the end of the extension, removing any numbers. This may be 
>>> a
>>> +             * major or minor number however, so the process is repeated 
>>> if a
>>> +             * minor number was found.
>>> +             *
>>> +             * ext_end is intended to represent the first character 
>>> *after* the
>>> +             * name portion of an extension, but will be decremented to 
>>> the last
>>> +             * character itself while eliminating the extensions version 
>>> number.
>>> +             * A simple re-increment solves this problem.
>>> +             */
>>> +            for ( ; *isa && *isa != '_'; ++isa )
>>> +                if ( unlikely(!isalnum(*isa)) )
>>> +                    ext_err = true;
>>> +
>>> +            ext_end = isa;
>>> +            if ( unlikely(ext_err) )
>>> +                break;
>> This, otoh, is an error. Which probably shouldn't go silently.
> 
> It is actually not an error, what it means that if version is mentioned or 
> not, so
> (1) rv32..._zbb_zbc
> (2) rv32..._zbb1p2_zbc
> 
> For (1), ext_err = true and it means that version isn't mentioned for _zbb 
> and after it
> immediately another extension name is going, so there is no any sense in 
> ignoring (or parsing) version
> numbers.
> For (2), ext_err = false (because it ends on number ) so we have to ignore 
> (or parse) version nubmers.
I don't follow. Why would ext_err be true for (1)? That's a well-formed
specifier, isn't it? rv32..._zbb.zbc (with the last dot meaning a literal
one, unlike the earlier ...) is an example of what would cause ext_err to
be true.
>>> +        default:
>>> +            /*
>>> +             * Things are a little easier for single-letter extensions, as 
>>> they
>>> +             * are parsed forwards.
>>> +             *
>>> +             * After checking that our starting position is valid, we need 
>>> to
>>> +             * ensure that, when isa was incremented at the start of the 
>>> loop,
>>> +             * that it arrived at the start of the next extension.
>>> +             *
>>> +             * If we are already on a non-digit, there is nothing to do. 
>>> Either
>>> +             * we have a multi-letter extension's _, or the start of an
>>> +             * extension.
>>> +             *
>>> +             * Otherwise we have found the current extension's major 
>>> version
>>> +             * number. Parse past it, and a subsequent p/minor version 
>>> number
>>> +             * if present. The `p` extension must not appear immediately 
>>> after
>>> +             * a number, so there is no fear of missing it.
>>> +             */
>>> +            if ( unlikely(!isalpha(*ext)) )
>>> +            {
>>> +                ext_err = true;
>>> +                break;
>>> +            }
>> Like above this also looks to be a situation that maybe better would be
>> left go entirely silently. I even wonder whether you can safely continue
>> the parsing in that case: How do you know whether what follows is indeed
>> the start of an extension name?
> 
> Lets consider examples:
> (1) riscv,isa=im
> (2) riscv,isa=i1p2m
> (3) riscv,isa=i1m
> 
> (4) riscv,isa=i1_m1
> 
> Note: Underscores "_" may be used to separate ISA extensions to improve 
> readability
> and to provide disambiguation, e.g., "RV32I2_M2_A2".
> 
> (1) isa="i" so ext = "m" => no minor and major version between "i" and "m" => 
> `ext` contains the name
>      of the next extension name, thereby we will break a loop and go for 
> parsing of the next extension
>      which is "m" in this example
> (2) isa="i" => ext="1" => algorithm will go further and will just skip minor 
> and major version for
>      this extension (1p2 => 1.2 version of extension "i")
> (3) just the same as (2) but will stop earlier as minor version isn't set.
> 
> Note: having "_" is legal from specification point of view, but it is illegal 
> to use it between single letter
>        extension from device tree binding point of view.
> (4) just the same as (2) and (3) but using "_" separator, so the if above 
> will just skip "_" and the next
>      after "_" is an extension name again.
> 
> Considering that "_" is illegal from device tree point of view between single 
> letter extensions we should have
> ASSERT(*ext != "_") and then only cases (1) - (3) will be possible to have.
> Probably it also will make a sense to have an array with legal single letter 
> extensions and check that *ext is
> in this array.
> 
> Interesting that device tree binding doesn't cover this case:
> ```
> Because the "P" extension for Packed SIMD can be confused for the decimal 
> point in a version number,
> it must be preceded by an underscore if it follows a number. For example, 
> "rv32i2p2" means version
> 2.2 of RV32I, whereas "rv32i2_p2" means version 2.0 of RV32I with version 2.0 
> of the P extension.
> ```
> if I correctly interpreted the 
> pattern:pattern:^rv(?:64|32)imaf?d?q?c?b?k?j?p?v?h?(?:[hsxz](?:[0-9a-z])+)?(?:_[hsxz](?:[0-9a-z])+)*$
> And it also doesn't support versions for single letter extensions.
> So "rv32i2_m2_a2" or with using "p" is impossible?
> Then riscv_isa_parse_string() could be simplified ( probably when it was 
> implemented in Linux kernel they don't have the binding for riscv,isa and they
> just tried to follow RISC-V specification ) for the case of single letter 
> extensions (or just keep it as is to be in sync with Linux kernel).
All fine, but what about e.g.
(5) riscv,isa=i?m1
>>> +static void __init riscv_fill_hwcap_from_isa_string(void)
>>> +{
>>> +    const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
>>> +    const struct dt_device_node *cpu;
>>> +
>>> +    if ( !cpus )
>>> +    {
>>> +        printk("Missing /cpus node in the device tree?\n");
>>> +        return;
>>> +    }
>>> +
>>> +    dt_for_each_child_node(cpus, cpu)
>>> +    {
>>> +        DECLARE_BITMAP(this_isa, RISCV_ISA_EXT_MAX);
>>> +        const char *isa;
>>> +        unsigned long cpuid;
>>> +
>>> +        if ( !dt_device_type_is_equal(cpu, "cpu") )
>>> +            continue;
>>> +
>>> +        if ( dt_get_cpuid_from_node(cpu, &cpuid) < 0 )
>>> +            continue;
>>> +
>>> +        if ( dt_property_read_string(cpu, "riscv,isa", &isa) )
>>> +        {
>>> +            printk("Unable to find \"riscv,isa\" devicetree entry "
>>> +                   "for DT's cpu%ld node\n", cpuid);
>>> +            continue;
>>> +        }
>>> +
>>> +        for ( unsigned int i = 0; (isa[i] != '\0'); i++ )
>>> +            if ( !isdigit(isa[i]) && (isa[i] != '_') && !islower(isa[i]) )
>>> +                panic("According to DT binding riscv,isa must be 
>>> lowercase\n");
>>> +
>>> +        riscv_isa_parse_string(isa, this_isa);
>>> +
>>> +        /*
>>> +         * In the unpriv. spec is mentioned:
>>> +         *   A RISC-V ISA is defined as a base integer ISA, which must be
>>> +         *   present in any implementation, plus optional extensions to
>>> +         *   the base ISA.
>>> +         * What means that isa should contain, at least, I or E thereby
>>> +         * this_isa can't be empty too.
>>> +         *
>>> +         * Ensure that this_isa is not empty, so riscv_isa won't be empty
>>> +         * during initialization. This prevents ending up with extensions
>>> +         * not supported by one of the CPUs.
>>> +         */
>>> +        ASSERT(!bitmap_empty(this_isa, RISCV_ISA_EXT_MAX));
>> This is again an assertion on input we consume. Afaict it would actually
>> trigger not only on all kinds of invalid inputs, but on something valid
>> like "rv32e".
> 
> In the case of "rv32e" I think it is fine that it will be asserted as in 
> riscv_isa_ext[] we don't
> have 'E' extension and thereby riscv_isa_ext[] should be updated properly.
Of course, but that still doesn't want to be by way of ASSERT()ing.
> Could you please explain me again what is wrong with having ASSERT() itself 
> for input we consume? If to change that
> to 'if ()' would it be better? Or it should be just moved to 
> riscv_isa_parse_string() where this bitmap is filled?
It's very simple: For internal state we maintain, use assertions to check
assumptions you make. For input we get, use other error checking (which may
be panic(), where no better option exists).
Jan
 
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |