[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 11/23] xen/riscv: introduce cmpxchg.h



On Wed, 2024-03-06 at 15:56 +0100, Jan Beulich wrote:
> On 26.02.2024 18:38, Oleksii Kurochko wrote:
> > The header was taken from Linux kernl 6.4.0-rc1.
> > 
> > Addionally, were updated:
> > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
> >   access.
> > * replace tabs with spaces
> > * replace __* variale with *__
> > * introduce generic version of xchg_* and cmpxchg_*.
> > 
> > Implementation of 4- and 8-byte cases were left as it is done in
> > Linux kernel as according to the RISC-V spec:
> > ```
> > Table A.5 ( only part of the table was copied here )
> > 
> > Linux Construct       RVWMO Mapping
> > atomic <op> relaxed    amo<op>.{w|d}
> > atomic <op> acquire    amo<op>.{w|d}.aq
> > atomic <op> release    amo<op>.{w|d}.rl
> > atomic <op>            amo<op>.{w|d}.aqrl
> > 
> > Linux Construct       RVWMO LR/SC Mapping
> > atomic <op> relaxed    loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop
> > atomic <op> acquire    loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez loop
> > atomic <op> release    loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez
> > loop OR
> >                        fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗ ;
> > bnez loop
> > atomic <op>            loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez
> > loop
> > 
> > The Linux mappings for release operations may seem stronger than
> > necessary,
> > but these mappings are needed to cover some cases in which Linux
> > requires
> > stronger orderings than the more intuitive mappings would provide.
> > In particular, as of the time this text is being written, Linux is
> > actively
> > debating whether to require load-load, load-store, and store-store
> > orderings
> > between accesses in one critical section and accesses in a
> > subsequent critical
> > section in the same hart and protected by the same synchronization
> > object.
> > Not all combinations of FENCE RW,W/FENCE R,RW mappings with aq/rl
> > mappings
> > combine to provide such orderings.
> > There are a few ways around this problem, including:
> > 1. Always use FENCE RW,W/FENCE R,RW, and never use aq/rl. This
> > suffices
> >    but is undesirable, as it defeats the purpose of the aq/rl
> > modifiers.
> > 2. Always use aq/rl, and never use FENCE RW,W/FENCE R,RW. This does
> > not
> >    currently work due to the lack of load and store opcodes with aq
> > and rl
> >    modifiers.
> 
> As before I don't understand this point. Can you give an example of
> what
> sort of opcode / instruction is missing?
If I understand the spec correctly then l{b|h|w|d} and s{b|h|w|d}
instructions don't have aq or rl annotation. Here is text from the
spec:
   ARM Operation                  RVWMO Mapping
   Load                           l{b|h|w|d}
   Load-Acquire                   fence rw, rw; l{b|h|w|d}; fence r,rw 
   Load-Exclusive                 lr.{w|d}
   Load-Acquire-Exclusive         lr.{w|d}.aqrl
   Store                          s{b|h|w|d}
   Store-Release                  fence rw,w; s{b|h|w|d}
   Store-Exclusive                sc.{w|d}
   Store-Release-Exclusive        sc.{w|d}.rl
   dmb                            fence rw,rw
   dmb.ld                         fence r,rw
   dmb.st                         fence w,w
   isb                            fence.i; fence r,r
     Table A.4: Mappings from ARM operations to RISC-V operations

   Table A.4 provides a mapping from ARM memory operations onto RISC-V
   memory instructions.
   Since RISC-V does not currently have plain load and store opcodes with
   aq or rl annotations, ARM
   load-acquire and store-release operations should be mapped using fences
   instead.

> 
> > 3. Strengthen the mappings of release operations such that they
> > would
> >    enforce sufficient orderings in the presence of either type of
> > acquire mapping.
> >    This is the currently-recommended solution, and the one shown in
> > Table A.5.
> > ```
> > 
> > But in Linux kenrel atomics were strengthen with fences:
> > ```
> > Atomics present the same issue with locking: release and acquire
> > variants need to be strengthened to meet the constraints defined
> > by the Linux-kernel memory consistency model [1].
> > 
> > Atomics present a further issue: implementations of atomics such
> > as atomic_cmpxchg() and atomic_add_unless() rely on LR/SC pairs,
> > which do not give full-ordering with .aqrl; for example, current
> > implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
> > below to end up with the state indicated in the "exists" clause.
> > 
> > In order to "synchronize" LKMM and RISC-V's implementation, this
> > commit strengthens the implementations of the atomics operations
> > by replacing .rl and .aq with the use of ("lightweigth") fences,
> > and by replacing .aqrl LR/SC pairs in sequences such as:
> > 
> > 0:      lr.w.aqrl  %0, %addr
> >         bne        %0, %old, 1f
> >         ...
> >         sc.w.aqrl  %1, %new, %addr
> >         bnez       %1, 0b
> > 1:
> > 
> > with sequences of the form:
> > 
> > 0:      lr.w       %0, %addr
> >         bne        %0, %old, 1f
> >               ...
> >         sc.w.rl    %1, %new, %addr   /* SC-release   */
> >         bnez       %1, 0b
> >         fence      rw, rw            /* "full" fence */
> > 1:
> > 
> > following Daniel's suggestion.
> > 
> > These modifications were validated with simulation of the RISC-V
> > memory consistency model.
> > 
> > C lr-sc-aqrl-pair-vs-full-barrier
> > 
> > {}
> > 
> > P0(int *x, int *y, atomic_t *u)
> > {
> >         int r0;
> >         int r1;
> > 
> >         WRITE_ONCE(*x, 1);
> >         r0 = atomic_cmpxchg(u, 0, 1);
> >         r1 = READ_ONCE(*y);
> > }
> > 
> > P1(int *x, int *y, atomic_t *v)
> > {
> >         int r0;
> >         int r1;
> > 
> >         WRITE_ONCE(*y, 1);
> >         r0 = atomic_cmpxchg(v, 0, 1);
> >         r1 = READ_ONCE(*x);
> > }
> > 
> > exists (u=1 /\ v=1 /\ 0:r1=0 /\ 1:r1=0)
> 
> While I'm entirely willing to trust this can happen, I can't bring
> this
> in line with the A extension spec.
> 
> Additionally it's not clear to me in how far all of this applies when
> you don't really use LR/SC in the 4- and 8-byte cases (and going
> forward
> likely also not in the 1- and 2-byte case, utilizing Zahba when
> available).
It just explain what combination of fences, lr/sc, amoswap, .aq and .rl
annotation can be combined, and why combinations introduced in this
patch are used.

> 
> > ---
> > Changes in V5:
> >  - update the commit message.
> >  - drop ALIGN_DOWN().
> >  - update the definition of emulate_xchg_1_2(): 
> >    - lr.d -> lr.w, sc.d -> sc.w.
> >    - drop ret argument.
> >    - code style fixes around asm volatile.
> >    - update prototype.
> >    - use asm named operands.
> >    - rename local variables.
> >    - add comment above the macros
> >  - update the definition of __xchg_generic:
> >    - drop local ptr__ variable.
> >    - code style fixes around switch()
> >    - update prototype.
> >  - introduce RISCV_FULL_BARRIES.
> >  - redefine cmpxchg()
> >  - update emulate_cmpxchg_1_2():
> >    - update prototype
> >    - update local variables names and usage of them
> >    - use name asm operands.
> >    - add comment above the macros
> > ---
> > Changes in V4:
> >  - Code style fixes.
> >  - enforce in __xchg_*() has the same type for new and *ptr, also
> > "\n"
> >    was removed at the end of asm instruction.
> >  - dependency from
> > https://lore.kernel.org/xen-devel/cover.1706259490.git.federico.serafini@xxxxxxxxxxx/
> >  - switch from ASSERT_UNREACHABLE to STATIC_ASSERT_UNREACHABLE().
> >  - drop xchg32(ptr, x) and xchg64(ptr, x) as they aren't used.
> >  - drop cmpxcg{32,64}_{local} as they aren't used.
> >  - introduce generic version of xchg_* and cmpxchg_*.
> >  - update the commit message.
> > ---
> > Changes in V3:
> >  - update the commit message
> >  - add emulation of {cmp}xchg_... for 1 and 2 bytes types
> > ---
> > Changes in V2:
> >  - update the comment at the top of the header.
> >  - change xen/lib.h to xen/bug.h.
> >  - sort inclusion of headers properly.
> > ---
> >  xen/arch/riscv/include/asm/cmpxchg.h | 258
> > +++++++++++++++++++++++++++
> >  1 file changed, 258 insertions(+)
> >  create mode 100644 xen/arch/riscv/include/asm/cmpxchg.h
> > 
> > diff --git a/xen/arch/riscv/include/asm/cmpxchg.h
> > b/xen/arch/riscv/include/asm/cmpxchg.h
> > new file mode 100644
> > index 0000000000..66cbe26737
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> > @@ -0,0 +1,258 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (C) 2014 Regents of the University of California */
> > +
> > +#ifndef _ASM_RISCV_CMPXCHG_H
> > +#define _ASM_RISCV_CMPXCHG_H
> > +
> > +#include <xen/compiler.h>
> > +#include <xen/lib.h>
> > +
> > +#include <asm/fence.h>
> > +#include <asm/io.h>
> > +#include <asm/system.h>
> > +
> > +#define __amoswap_generic(ptr, new, ret, sfx, pre, post) \
> > +({ \
> > +    asm volatile( \
> 
> Nit: In Xen style this is lacking a blank ahead of the opening
> parenthesis.
> 
> > +        pre \
> > +        " amoswap" sfx " %0, %2, %1\n" \
> > +        post \
> > +        : "=r" (ret), "+A" (*ptr) \
> > +        : "r" (new) \
> > +        : "memory" ); \
> > +})
> > +
> > +/*
> > + * For LR and SC, the A extension requires that the address held
> > in rs1 be
> > + * naturally aligned to the size of the operand (i.e., eight-byte
> > aligned
> > + * for 64-bit words and four-byte aligned for 32-bit words).
> > + * If the address is not naturally aligned, an address-misaligned
> > exception
> > + * or an access-fault exception will be generated.
> > + * 
> > + * Thereby:
> > + * - for 1-byte xchg access the containing word by clearing low
> > two bits
> > + * - for 2-byte xchg ccess the containing word by clearing first
> > bit.
> 
> "first bit" can still be ambiguous. Better say "bit 1".
> 
> > + * 
> 
> Here and apparently also elsewhere: Stray trailing blank. Git has a
> config
> setting to warn you about (maybe even to automatically strip? such.
It  would be useful for me. Thanks a lot for such recommendation.

> 
> > + * If resulting 4-byte access is still misalgined, it will fault
> > just as
> > + * non-emulated 4-byte access would.
> > + */
> > +#define emulate_xchg_1_2(ptr, new, sc_sfx, pre, post) \
> > +({ \
> > +    uint32_t *aligned_ptr = (uint32_t *)((unsigned long)ptr &
> > ~(0x4 - sizeof(*ptr))); \
> 
> Here and elsewhere: sizeof(*(ptr)) (i.e. the inner parentheses are
> needed
> also there).
> 
> > +    uint8_t new_val_pos = ((unsigned long)(ptr) & (0x4 -
> > sizeof(*ptr))) * BITS_PER_BYTE; \
> 
> Why uint8_t?
It is enough to cover possible start bit position of value that should
be updated, so I decided to use uint8_t.

> 
> > +    unsigned long mask = GENMASK(((sizeof(*ptr)) * BITS_PER_BYTE)
> > - 1, 0) << new_val_pos; \
> > +    unsigned int new_ = new << new_val_pos; \
> > +    unsigned int old_val; \
> > +    unsigned int xchged_val; \
> > +    \
> > +    asm volatile ( \
> > +        pre \
> > +        "0: lr.w %[op_oldval], %[op_aligned_ptr]\n" \
> > +        "   and  %[op_xchged_val], %[op_oldval], %z[op_nmask]\n" \
> > +        "   or   %[op_xchged_val], %[op_xchged_val], %z[op_new]\n"
> > \
> > +        "   sc.w" sc_sfx " %[op_xchged_val], %[op_xchged_val],
> > %[op_aligned_ptr]\n" \
> > +        "   bnez %[op_xchged_val], 0b\n" \
> > +        post \
> > +        : [op_oldval] "=&r" (old_val), [op_xchged_val] "=&r"
> > (xchged_val), [op_aligned_ptr]"+A" (*aligned_ptr) \
> 
> Too long line. Partly because you have op_ prefixes here which I
> can't
> recognized what they would be good for. The val / _val suffixes also
> don't appear to carry much useful information. And "xchged", being
> explicitly past tense, doesn't look to fit even up and until the SC,
> not to speak of afterwards. Anything wrong with calling this just
> tmp,
> aux, or scratch?
op_ can be dropped and named operand can be equal to local variable
name, I thought it would be useful to understand that it is named
operand, but after rethinking it looks like unneeded overhead.

In case of emulate_xchg_1_2() there is no sense in val/_val suffixes as
local variables don't intersect with macros variable, and the suffixes
were added just to be in sync with emulate_cmpxchg_1_2 macros, but in
case of emulate_cmpxchg_1_2(ptr, old, new, sc_sfx, pre, post), the
macros has old argument, so to distinguish them _val was added.
Probably, it would be better to rename it to read or read_old.


> 
> > +        : [op_new] "rJ" (new_), [op_nmask] "rJ" (~mask) \
> > +        : "memory" ); \
> > +    \
> > +    (__typeof__(*(ptr)))((old_val & mask) >> new_val_pos); \
> > +})
> > +
> > +#define __xchg_generic(ptr, new, size, sfx, pre, post) \
> > +({ \
> > +    __typeof__(*(ptr)) new__ = (new); \
> > +    __typeof__(*(ptr)) ret__; \
> > +    switch ( size ) \
> 
> Can't this use sizeof(*(ptr)), allowing for one less macro parameter?
> 
> > +    { \
> > +    case 1: \
> > +    case 2: \
> > +        ret__ = emulate_xchg_1_2(ptr, new__, sfx, pre, post); \
> > +        break; \
> > +    case 4: \
> > +        __amoswap_generic(ptr, new__, ret__,\
> > +                          ".w" sfx,  pre, post); \
> > +        break; \
> > +    case 8: \
> > +        __amoswap_generic(ptr, new__, ret__,\
> > +                          ".d" sfx,  pre, post); \
> > +        break; \
> 
> In io.h you make sure to avoid rv64-only insns. Here you don't. The
> build
> would fail either way, but this still looks inconsistent.
> 
> Also nit: Stray double blands (twice) ahead of "pre". Plus with this
> style
> of line continuation you want to consistently have exactly one blank
> ahead
> of each backslash.
> 
> > +    default: \
> > +        STATIC_ASSERT_UNREACHABLE(); \
> > +    } \
> > +    ret__; \
> > +})
> > +
> > +#define xchg_relaxed(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) x_ = (x); \
> 
> What is the purpose of this, when __xchg_generic() already does this
> same
> type conversion?
> 
> > +    (__typeof__(*(ptr)))__xchg_generic(ptr, x_, sizeof(*(ptr)),
> > "", "", ""); \
> > +})
> > +
> > +#define xchg_acquire(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) x_ = (x); \
> > +    (__typeof__(*(ptr)))__xchg_generic(ptr, x_, sizeof(*(ptr)), \
> > +                                       "", "",
> > RISCV_ACQUIRE_BARRIER); \
> > +})
> > +
> > +#define xchg_release(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) x_ = (x); \
> > +    (__typeof__(*(ptr)))__xchg_generic(ptr, x_, sizeof(*(ptr)),\
> > +                                       "", RISCV_RELEASE_BARRIER,
> > ""); \
> > +})
> 
> As asked before: Are there going to be any uses of these three?
> Common
> code doesn't require them. And not needing to provide them would
> simplify things quite a bit, it seems.
I checked my private branches and it looks to me that I introduced them
only for the correspondent atomic operations ( which was copied from
Linux Kernel ) which are not also used.

So we could definitely drop these macros for now, but should
xchg_generic() be updated as well? If to look at:
 #define xchg(ptr, x) __xchg_generic(ptr, (unsigned long)(x), sizeof(*
(ptr)), \
                                    ".aqrl", "", "")
Last two arguments start to be unneeded, but I've wanted to leave them,
in case someone will needed to back xchg_{release, acquire, ...}. Does
it make any sense?

> 
> > +#define xchg(ptr, x) __xchg_generic(ptr, (unsigned long)(x),
> > sizeof(*(ptr)), \
> > +                                    ".aqrl", "", "")
> 
> According to the earlier comment (where I don't follow the example
> given),
> is .aqrl sufficient here? And even if it was for the 4- and 8-byte
> cases,
> is it sufficient in the 1- and 2-byte emulation case (where it then
> is
> appended to just the SC)?
If I understand your question correctly then accroding to the spec.,
.aqrl is enough for amo<op>.{w|d} instructions:
   Linux Construct        RVWMO AMO Mapping
   atomic <op> relaxed    amo<op>.{w|d}
   atomic <op> acquire    amo<op>.{w|d}.aq
   atomic <op> release    amo<op>.{w|d}.rl
   atomic <op>            amo<op>.{w|d}.aqrl
but in case of lr/sc you are right sc requires suffix too:
   Linux Construct        RVWMO LR/SC Mapping
   atomic <op> relaxed    loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop
   atomic <op> acquire    loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez loop
   atomic <op> release    loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez 
   loop OR fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗ ; bnez loop
   atomic <op>            loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez
   loop
   
I will add sc_sfx to emulate_xchg_1_2(). The only question is left if
__xchg_generic(ptr, new, size, sfx, pre, post) should be changed to:
__xchg_generic(ptr, new, size, sfx1, sfx2, pre, post) to cover both
cases amo<op>.{w|d}.sfx1 and lr.{w|d}.sfx1 ... sc.{w|d}.sfx2?

~ Oleksii

> 
> > +#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx, pre,
> > post)       \
> > + ({ \
> > +    register unsigned int rc; \
> > +    asm volatile( \
> > +        pre \
> > +        "0: lr" lr_sfx " %0, %2\n" \
> > +        "   bne  %0, %z3, 1f\n" \
> > +        "   sc" sc_sfx " %1, %z4, %2\n" \
> > +        "   bnez %1, 0b\n" \
> > +        post \
> > +        "1:\n" \
> > +        : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
> > +        : "rJ" (old), "rJ" (new) \
> > +        : "memory"); \
> > + })
> > +
> > +/*
> > + * For LR and SC, the A extension requires that the address held
> > in rs1 be
> > + * naturally aligned to the size of the operand (i.e., eight-byte
> > aligned
> > + * for 64-bit words and four-byte aligned for 32-bit words).
> > + * If the address is not naturally aligned, an address-misaligned
> > exception
> > + * or an access-fault exception will be generated.
> > + * 
> > + * Thereby:
> > + * - for 1-byte xchg access the containing word by clearing low
> > two bits
> > + * - for 2-byte xchg ccess the containing word by clearing first
> > bit.
> > + * 
> > + * If resulting 4-byte access is still misalgined, it will fault
> > just as
> > + * non-emulated 4-byte access would.
> > + *
> > + * old_val was casted to unsigned long at the end of the define
> > because of
> > + * the following issue:
> > + * ./arch/riscv/include/asm/cmpxchg.h:166:5: error: cast to
> > pointer from integer of different size [-Werror=int-to-pointer-
> > cast]
> > + * 166 |     (__typeof__(*(ptr)))(old_val >> new_val_pos); \
> > + *     |     ^
> > + * ./arch/riscv/include/asm/cmpxchg.h:184:17: note: in expansion
> > of macro 'emulate_cmpxchg_1_2'
> > + * 184 |         ret__ = emulate_cmpxchg_1_2(ptr, old, new, \
> > + *     |                 ^~~~~~~~~~~~~~~~~~~
> > + * ./arch/riscv/include/asm/cmpxchg.h:227:5: note: in expansion of
> > macro '__cmpxchg_generic'
> > + * 227 |     __cmpxchg_generic(ptr, (unsigned long)(o), (unsigned
> > long)(n), \
> > + *     |     ^~~~~~~~~~~~~~~~~
> > + * ./include/xen/lib.h:141:26: note: in expansion of macro
> > '__cmpxchg'
> > + * 141 |     ((__typeof__(*(ptr)))__cmpxchg(ptr, (unsigned
> > long)o_,              \
> > + *     |                          ^~~~~~~~~
> > + * common/event_channel.c:109:13: note: in expansion of macro
> > 'cmpxchgptr'
> > + * 109 |             cmpxchgptr(&xen_consumers[i], NULL, fn);
> > + */
> 
> This is too much detail on the compile issue. Just mentioning that
> said
> cast is needed for cmpxchgptr() ought to be sufficient.
> 
> > +#define emulate_cmpxchg_1_2(ptr, old, new, sc_sfx, pre, post) \
> > +({ \
> > +    uint32_t *aligned_ptr = (uint32_t *)((unsigned long)ptr &
> > ~(0x4 - sizeof(*ptr))); \
> > +    uint8_t new_val_pos = ((unsigned long)(ptr) & (0x4 -
> > sizeof(*ptr))) * BITS_PER_BYTE; \
> > +    unsigned long mask = GENMASK(((sizeof(*ptr)) * BITS_PER_BYTE)
> > - 1, 0) << new_val_pos; \
> > +    unsigned int old_ = old << new_val_pos; \
> > +    unsigned int new_ = new << new_val_pos; \
> > +    unsigned int old_val; \
> > +    unsigned int xchged_val; \
> > +    \
> > +    __asm__ __volatile__ ( \
> > +        pre \
> > +        "0: lr.w %[op_xchged_val], %[op_aligned_ptr]\n" \
> > +        "   and  %[op_oldval], %[op_xchged_val], %z[op_mask]\n" \
> > +        "   bne  %[op_oldval], %z[op_old], 1f\n" \
> > +        "   xor  %[op_xchged_val], %[op_oldval],
> > %[op_xchged_val]\n" \
> > +        "   or   %[op_xchged_val], %[op_xchged_val], %z[op_new]\n"
> > \
> > +        "   sc.w" sc_sfx " %[op_xchged_val], %[op_xchged_val],
> > %[op_aligned_ptr]\n" \
> > +        "   bnez %[op_xchged_val], 0b\n" \
> > +        post \
> > +        "1:\n" \
> > +        : [op_oldval] "=&r" (old_val), [op_xchged_val] "=&r"
> > (xchged_val), [op_aligned_ptr] "+A" (*aligned_ptr) \
> > +        : [op_old] "rJ" (old_), [op_new] "rJ" (new_), \
> > +          [op_mask] "rJ" (mask) \
> > +        : "memory" ); \
> > +    \
> > +    (__typeof__(*(ptr)))((unsigned long)old_val >> new_val_pos); \
> > +})
> > +
> > +/*
> > + * Atomic compare and exchange.  Compare OLD with MEM, if
> > identical,
> > + * store NEW in MEM.  Return the initial value in MEM.  Success is
> > + * indicated by comparing RETURN with OLD.
> > + */
> > +#define __cmpxchg_generic(ptr, old, new, size, sc_sfx, pre, post)
> > \
> > +({ \
> > +    __typeof__(ptr) ptr__ = (ptr); \
> > +    __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \
> > +    __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \
> > +    __typeof__(*(ptr)) ret__; \
> > +    switch ( size ) \
> > +    { \
> > +    case 1: \
> > +    case 2: \
> > +        ret__ = emulate_cmpxchg_1_2(ptr, old, new, \
> > +                            sc_sfx, pre, post); \
> > +        break; \
> > +    case 4: \
> > +        __generic_cmpxchg(ptr__, old__, new__, ret__, \
> > +                          ".w", ".w"sc_sfx, pre, post); \
> > +        break; \
> > +    case 8: \
> > +        __generic_cmpxchg(ptr__, old__, new__, ret__, \
> > +                          ".d", ".d"sc_sfx, pre, post); \
> > +        break; \
> > +    default: \
> > +        STATIC_ASSERT_UNREACHABLE(); \
> > +    } \
> > +    ret__; \
> > +})
> > +
> > +#define cmpxchg_relaxed(ptr, o, n) \
> > +({ \
> > +    __typeof__(*(ptr)) o_ = (o); \
> > +    __typeof__(*(ptr)) n_ = (n); \
> > +    (__typeof__(*(ptr)))__cmpxchg_generic(ptr, \
> > +                    o_, n_, sizeof(*(ptr)), "", "", ""); \
> > +})
> > +
> > +#define cmpxchg_acquire(ptr, o, n) \
> > +({ \
> > +    __typeof__(*(ptr)) o_ = (o); \
> > +    __typeof__(*(ptr)) n_ = (n); \
> > +    (__typeof__(*(ptr)))__cmpxchg_generic(ptr, o_, n_,
> > sizeof(*(ptr)), \
> > +                                          "", "",
> > RISCV_ACQUIRE_BARRIER); \
> > +})
> > +
> > +#define cmpxchg_release(ptr, o, n) \
> > +({ \
> > +    __typeof__(*(ptr)) o_ = (o); \
> > +    __typeof__(*(ptr)) n_ = (n); \
> > +    (__typeof__(*(ptr)))__cmpxchg_release(ptr, o_, n_,
> > sizeof(*(ptr)), \
> 
> There's no __cmpxchg_release() afaics; dym __cmpxchg_generic()?
> 
> Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.