[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu

To: Jan Beulich <jbeulich@xxxxxxxx>
From: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
Date: Wed, 5 Feb 2025 20:00:58 +0100
Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 05 Feb 2025 19:01:19 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 2/4/25 12:47 PM, Jan Beulich wrote:

On 04.02.2025 08:20, Oleksii Kurochko wrote:

+/*
+ * The canonical order of ISA extension names in the ISA string is defined in
+ * chapter 27 of the unprivileged specification.
+ *
+ * The specification uses vague wording, such as should, when it comes to
+ * ordering, so for our purposes the following rules apply:
+ *
+ * 1. All multi-letter extensions must be separated from other extensions by an
+ *    underscore.
+ *
+ * 2. Additional standard extensions (starting with 'Z') must be sorted after
+ *    single-letter extensions and before any higher-privileged extensions.
+ *
+ * 3. The first letter following the 'Z' conventionally indicates the most
+ *    closely related alphabetical extension category, IMAFDQLCBKJTPVH.
+ *    If multiple 'Z' extensions are named, they must be ordered first by
+ *    category, then alphabetically within a category.
+ *
+ * 4. Standard supervisor-level extensions (starting with 'S') must be listed
+ *    after standard unprivileged extensions.  If multiple supervisor-level
+ *    extensions are listed, they must be ordered alphabetically.
+ *
+ * 5. Standard machine-level extensions (starting with 'Zxm') must be listed
+ *    after any lower-privileged, standard extensions.  If multiple
+ *    machine-level extensions are listed, they must be ordered
+ *    alphabetically.
+ *
+ * 6. Non-standard extensions (starting with 'X') must be listed after all
+ *    standard extensions. If multiple non-standard extensions are listed, they
+ *    must be ordered alphabetically.
+ *
+ * An example string following the order is:
+ *    rv64imadc_zifoo_zigoo_zafoo_sbar_scar_zxmbaz_xqux_xrux
+ *
+ * New entries to this struct should follow the ordering rules described above.
+ *
+ * Extension name must be all lowercase (according to device-tree binding)
+ * and strncmp() is used in match_isa_ext() to compare extension names instead
+ * of strncasecmp().
+ */
+const struct riscv_isa_ext_data __initconst riscv_isa_ext[] = {
+    RISCV_ISA_EXT_DATA(i, RISCV_ISA_EXT_i),
+    RISCV_ISA_EXT_DATA(m, RISCV_ISA_EXT_m),
+    RISCV_ISA_EXT_DATA(a, RISCV_ISA_EXT_a),
+    RISCV_ISA_EXT_DATA(f, RISCV_ISA_EXT_f),
+    RISCV_ISA_EXT_DATA(d, RISCV_ISA_EXT_d),
+    RISCV_ISA_EXT_DATA(q, RISCV_ISA_EXT_q),
+    RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
+    RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
+    RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
+    RISCV_ISA_EXT_DATA(zifencei, RISCV_ISA_EXT_ZIFENCEI),
+    RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
+    RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
+    RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
+    RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
+    RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
+};
+
+static const struct riscv_isa_ext_data __initconst required_extensions[] = {
+    RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
+    RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
+    RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
+};

Coming back to my earlier question regarding the B (pseudo-)extension:
Since riscv_isa_ext[] only contains Zbb, is it precluded anywhere in
the spec that DT may mention just B when all of its constituents are
supported?

According to the device tree binding there is no such option just to mention B for riscv,extensions property:
  https://elixir.bootlin.com/linux/v6.13.1/source/Documentation/devicetree/bindings/riscv/extensions.yaml#L246

but if look at pattern for riscv,isa property:
  pattern: ^rv(?:64|32)imaf?d?q?c?b?k?j?p?v?h?(?:[hsxz](?:[0-9a-z])+)?(?:_[hsxz](?:[0-9a-z])+)*$
"B" could be set.
But it doesn't mentioned in device tree binding what is included when "B" is mentioned in riscv,isa property, so I
assume that it is following B Standard Extension doc [ https://github.com/riscv/riscv-b/blob/main/b.adoc#11-privileged-architecture-implications ]
and will do the same as for misa register:
  Misa.B=1 then it means that Zba, Zbb, Zbc are all supported.
Regarding "G" extensions it isn't used in device tree binding and is explicitly unfolded in "IMAFDZicsr_Zifencei". I will add them to required_extensions[] for
consistency.

+        {
+            __set_bit(ext->id, bitmap);
+            break;
+        }
+    }
+}
+
+static int __init riscv_isa_parse_string(const char *isa,
+                                         unsigned long *out_bitmap)
+{
+    if ( (isa[0] != 'r') && (isa[1] != 'v') )
+        return -EINVAL;
+
+#if defined(CONFIG_RISCV_32)
+    if ( isa[2] != '3' && isa[3] != '2' )
+        return -EINVAL;
+#elif defined(CONFIG_RISCV_64)
+    if ( isa[2] != '6' && isa[3] != '4' )
+        return -EINVAL;
+#else
+# error "unsupported RISC-V bitness"
+#endif
+
+    isa += 4;
+
+    while ( *isa )
+    {
+        const char *ext = isa++;
+        const char *ext_end = isa;
+        bool ext_err = false;
+
+        switch ( *ext )
+        {
+        case 'x':
+            printk_once("Vendor extensions are ignored in riscv,isa\n");
+            /*
+             * To skip an extension, we find its end.
+             * As multi-letter extensions must be split from other multi-letter
+             * extensions with an "_", the end of a multi-letter extension will
+             * either be the null character or the "_" at the start of the next
+             * multi-letter extension.
+             */
+            for ( ; *isa && *isa != '_'; ++isa )
+                ;
+            ext_err = true;
+            break;

I was wondering why ext_end isn't updated here. Looks like that's because
ext_err is set to true, and the checking below the switch looks at ext_err
first. That's technically fine, but - to me at least - a little confusing.
Could setting ext_end to NULL take the role of ext_err? For the code here
this would then immediately clarify why ext_end isn't otherwise maintained.
(The "err" in the name is also somewhat odd: The log message above says
"ignored", and that's what the code below also does. There's nothing
error-ish in here; in fact there may be nothing wrong about the specific
vendor extension we're ignoring.)

IIUC, your suggestion is to use instead of "ext_err = true" -> "ext_end = NULL". If my understanding
is correct then we can really do in this way (of course, with some updates as you mentioned below).

+        case 's':
+            /*
+             * Workaround for invalid single-letter 's' & 'u' (QEMU):
+             *   Before QEMU 7.1 it was an issue with misa to ISA string
+             *   conversion:
+             *     https://patchwork.kernel.org/project/qemu-devel/patch/dee09d708405075420b29115c1e9e87910b8da55.1648270894.git.research_trasio@xxxxxxxxxxxx/#24792587
+             *   Additional details of the workaround on Linux kernel side:
+             *     https://lore.kernel.org/linux-riscv/ae93358e-e117-b43d-faad-772c529f846c@xxxxxxxxxxxx/#t
+             *
+             * No need to set the bit in riscv_isa as 's' & 'u' are
+             * not valid ISA extensions. It works unless the first
+             * multi-letter extension in the ISA string begins with
+             * "Su" and is not prefixed with an underscore.
+             */
+            if ( ext[-1] != '_' && ext[1] == 'u' )
+            {
+                ++isa;
+                ext_err = true;
+                break;
+            }
+            fallthrough;
+        case 'z':
+            /*
+             * Before attempting to parse the extension itself, we find its end.
+             * As multi-letter extensions must be split from other multi-letter
+             * extensions with an "_", the end of a multi-letter extension will
+             * either be the null character or the "_" at the start of the next
+             * multi-letter extension.
+             *
+             * Next, as the extensions version is currently ignored, we
+             * eliminate that portion. This is done by parsing backwards from
+             * the end of the extension, removing any numbers. This may be a
+             * major or minor number however, so the process is repeated if a
+             * minor number was found.
+             *
+             * ext_end is intended to represent the first character *after* the
+             * name portion of an extension, but will be decremented to the last
+             * character itself while eliminating the extensions version number.
+             * A simple re-increment solves this problem.
+             */
+            for ( ; *isa && *isa != '_'; ++isa )
+                if ( unlikely(!isalnum(*isa)) )
+                    ext_err = true;
+
+            ext_end = isa;
+            if ( unlikely(ext_err) )
+                break;

This, otoh, is an error. Which probably shouldn't go silently.

It is actually not an error, what it means that if version is mentioned or not, so
(1) rv32..._zbb_zbc
(2) rv32..._zbb1p2_zbc

For (1), ext_err = true and it means that version isn't mentioned for _zbb and after it
immediately another extension name is going, so there is no any sense in ignoring (or parsing) version
numbers.
For (2), ext_err = false (because it ends on number ) so we have to ignore (or parse) version nubmers.


Considering the earlier remark, ext_end would then perhaps also be more
logical to update after the above if().

+            if ( !isdigit(ext_end[-1]) )
+                break;
+
+            while ( isdigit(*--ext_end) )
+                ;
+
+            if ( tolower(ext_end[0]) != 'p' || !isdigit(ext_end[-1]) )

Leftover tolower()?

Agree, it should be dropped as we have a check that riscv,isa comes all in lowercase letters.

+            {
+                ++ext_end;
+                break;
+            }
+
+            while ( isdigit(*--ext_end) )
+                ;
+
+            ++ext_end;
+            break;
+
+        default:
+            /*
+             * Things are a little easier for single-letter extensions, as they
+             * are parsed forwards.
+             *
+             * After checking that our starting position is valid, we need to
+             * ensure that, when isa was incremented at the start of the loop,
+             * that it arrived at the start of the next extension.
+             *
+             * If we are already on a non-digit, there is nothing to do. Either
+             * we have a multi-letter extension's _, or the start of an
+             * extension.
+             *
+             * Otherwise we have found the current extension's major version
+             * number. Parse past it, and a subsequent p/minor version number
+             * if present. The `p` extension must not appear immediately after
+             * a number, so there is no fear of missing it.
+             */
+            if ( unlikely(!isalpha(*ext)) )
+            {
+                ext_err = true;
+                break;
+            }

Like above this also looks to be a situation that maybe better would be
left go entirely silently. I even wonder whether you can safely continue
the parsing in that case: How do you know whether what follows is indeed
the start of an extension name?

Lets consider examples:
(1) riscv,isa=im
(2) riscv,isa=i1p2m
(3) riscv,isa=i1m

(4) riscv,isa=i1_m1

Note: Underscores "_" may be used to separate ISA extensions to improve readability
and to provide disambiguation, e.g., "RV32I2_M2_A2".

(1) isa="i" so ext = "m" => no minor and major version between "i" and "m" => `ext` contains the name
    of the next extension name, thereby we will break a loop and go for parsing of the next extension
    which is "m" in this example
(2) isa="i" => ext="1" => algorithm will go further and will just skip minor and major version for
    this extension (1p2 => 1.2 version of extension "i") 
(3) just the same as (2) but will stop earlier as minor version isn't set.

Note: having "_" is legal from specification point of view, but it is illegal to use it between single letter
      extension from device tree binding point of view.
(4) just the same as (2) and (3) but using "_" separator, so the if above will just skip "_" and the next
    after "_" is an extension name again.

Considering that "_" is illegal from device tree point of view between single letter extensions we should have
ASSERT(*ext != "_") and then only cases (1) - (3) will be possible to have.
Probably it also will make a sense to have an array with legal single letter extensions and check that *ext is
in this array.

Interesting that device tree binding doesn't cover this case:
```
Because the "P" extension for Packed SIMD can be confused for the decimal point in a version number,
it must be preceded by an underscore if it follows a number. For example, "rv32i2p2" means version
2.2 of RV32I, whereas "rv32i2_p2" means version 2.0 of RV32I with version 2.0 of the P extension.
```
if I correctly interpreted the pattern: pattern: ^rv(?:64|32)imaf?d?q?c?b?k?j?p?v?h?(?:[hsxz](?:[0-9a-z])+)?(?:_[hsxz](?:[0-9a-z])+)*$
And it also doesn't support versions for single letter extensions.
So "rv32i2_m2_a2" or with using "p" is impossible?
Then riscv_isa_parse_string() could be simplified ( probably when it was implemented in Linux kernel they don't have the binding for riscv,isa and they
just tried to follow RISC-V specification ) for the case of single letter extensions (or just keep it as is to be in sync with Linux kernel).

+static void __init riscv_fill_hwcap_from_isa_string(void)
+{
+    const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
+    const struct dt_device_node *cpu;
+
+    if ( !cpus )
+    {
+        printk("Missing /cpus node in the device tree?\n");
+        return;
+    }
+
+    dt_for_each_child_node(cpus, cpu)
+    {
+        DECLARE_BITMAP(this_isa, RISCV_ISA_EXT_MAX);
+        const char *isa;
+        unsigned long cpuid;
+
+        if ( !dt_device_type_is_equal(cpu, "cpu") )
+            continue;
+
+        if ( dt_get_cpuid_from_node(cpu, &cpuid) < 0 )
+            continue;
+
+        if ( dt_property_read_string(cpu, "riscv,isa", &isa) )
+        {
+            printk("Unable to find \"riscv,isa\" devicetree entry "
+                   "for DT's cpu%ld node\n", cpuid);
+            continue;
+        }
+
+        for ( unsigned int i = 0; (isa[i] != '\0'); i++ )
+            if ( !isdigit(isa[i]) && (isa[i] != '_') && !islower(isa[i]) )
+                panic("According to DT binding riscv,isa must be lowercase\n");
+
+        riscv_isa_parse_string(isa, this_isa);
+
+        /*
+         * In the unpriv. spec is mentioned:
+         *   A RISC-V ISA is defined as a base integer ISA, which must be
+         *   present in any implementation, plus optional extensions to
+         *   the base ISA.
+         * What means that isa should contain, at least, I or E thereby
+         * this_isa can't be empty too.
+         *
+         * Ensure that this_isa is not empty, so riscv_isa won't be empty
+         * during initialization. This prevents ending up with extensions
+         * not supported by one of the CPUs.
+         */
+        ASSERT(!bitmap_empty(this_isa, RISCV_ISA_EXT_MAX));

This is again an assertion on input we consume. Afaict it would actually
trigger not only on all kinds of invalid inputs, but on something valid
like "rv32e".

In the case of "rv32e" I think it is fine that it will be asserted as in riscv_isa_ext[] we don't
have 'E' extension and thereby riscv_isa_ext[] should be updated properly.

>From the unpriv. spec then "i" ( a base integer ISA ) must be present ( the comment above should be
updated a little bit ). And if I am reading riscv,isa's pattern ( mentioned above ) properly "ima" should be present
too.
Probably I have to update the code, at the start of riscv_isa_parsing() where "rv{32,64}" is checked.

Could you please explain me again what is wrong with having ASSERT() itself for input we consume? If to change that
to 'if ()' would it be better? Or it should be just moved to riscv_isa_parse_string() where this bitmap is filled?

Thanks.

~ Oleksii

Follow-Ups:
- Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu
  - From: Jan Beulich

References:
- [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu
  - From: Oleksii Kurochko
- Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu
  - From: Jan Beulich

Prev by Date: Re: [RFC PATCH 9/9] hw/xen: Have legacy Xen backend inherit from DYNAMIC_SYS_BUS_DEVICE
Next by Date: Re: [PATCH v3] xen/riscv: identify specific ISA supported by cpu
Previous by thread: Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu
Next by thread: Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.