[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen



Hi Julien,

Thanks for taking a look at the patches and providing feedback. I've seen your
other comments and will reply to those separately when I get a chance (maybe at
the weekend or over the Christmas break).

RE the differences in ordering semantics between Xen's and Linux's atomics
helpers, please find my notes below.

Thoughts?

Cheers,
Ash.


The tables below use format AAA/BBB/CCC/DDD/EEE, where:

 - AAA is the memory barrier before the operation
 - BBB is the acquire semantics of the atomic operation
 - CCC is the release semantics of the atomic operation
 - DDD is whether the asm() block clobbers memory
 - EEE is the memory barrier after the operation

For example, ---/---/rel/mem/dmb would mean:

 - No memory barrier before the operation
 - The atomic does *not* have acquire semantics
 - The atomic *does* have release semantics
 - The asm() block clobbers memory
 - There is a DMB memory barrier after the atomic operation


    arm64 LL/SC
    ===========

        Xen Function            Xen                     Linux                   
Inconsistent
        ============            ===                     =====                   
============

        atomic_add              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_add_return       ---/---/rel/mem/dmb     ---/---/rel/mem/dmb     
--- (1)
        atomic_sub              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_sub_return       ---/---/rel/mem/dmb     ---/---/rel/mem/dmb     
--- (1)        
        atomic_and              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_cmpxchg          dmb/---/---/---/dmb     ---/---/rel/mem/---     
YES (2)
        atomic_xchg             ---/---/rel/mem/dmb     ---/acq/rel/mem/dmb     
YES (3)

(1) It's actually interesting to me that Linux does it this way. As with the
    LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB.
    Unless I'm missing something where there is a concern around taking an IRQ
    between the LDAXR and the STLXR, which can't happen in the LSE atomic case
    since it's a single instruction. But the exclusive monitor is cleared on
    exception return in AArch64 so I'm struggling to see what that potential
    issue may be. Regardless, Linux and Xen are consistent so we're OK ;-)

(2) The Linux version uses either STLXR with rel semantics if the comparison
    passes, or DMB if the comparison fails. This is weaker than Xen's version,
    which is quite blunt in always wrapping the operation between two DMBs. This
    may be a holdover from Xen's arm32 versions being ported to arm64, as we
    didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless,
    this is quite a big discrepancy and I've not yet given it enough thought to
    determine whether it would actually cause an issue. My feeling is that the
    Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but
    like you said these helpers are well tested so I'd be surprised if there
    is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does*
    have acq semantics.

(3) The Linux version just adds acq semantics to the LL, so we're OK here.


    arm64 LSE (comparison to Xen's LL/SC)
    =====================================

        Xen Function            Xen                     Linux                   
Inconsistent
        ============            ===                     =====                   
============

        atomic_add              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_add_return       ---/---/rel/mem/dmb     ---/acq/rel/mem/---     
YES (4)
        atomic_sub              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_sub_return       ---/---/rel/mem/dmb     ---/acq/rel/mem/---     
YES (4)
        atomic_and              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_cmpxchg          dmb/---/---/---/dmb     ---/acq/rel/mem/---     
YES (5)
        atomic_xchg             ---/---/rel/mem/dmb     ---/acq/rel/mem/---     
YES (4)

(4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to
    work too. I don't think this discrepancy will cause any issues.

(5) As with (2) above, this is quite a big discrepancy to Xen. However at least
    this version has acq semantics unlike the LL/SC version in (2), so I'm more
    confident that there won't be regressions going from Xen LL/SC to Linux LSE
    version of atomic_cmpxchg().


    arm32 LL/SC
    ===========

        Xen Function            Xen                     Linux                   
Inconsistent
        ============            ===                     =====                   
============

        atomic_add              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_add_return       dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     
YES (6)
        atomic_sub              ---/---/---/---/---     ---/---/---/---/---     
---
        atomic_sub_return       dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     
YES (6)
        atomic_and              ---/---/---/---/---     ---/---/---/---/---     
---  
        atomic_cmpxchg          dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     
YES (6)
        atomic_xchg             dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     
YES (6)

(6) Linux only provides relaxed variants of these functions, such as
    atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14
    in the series add the stricter versions expected by Xen, wrapping calls to
    Linux's relaxed variants inbetween two calls to smb_mb(). This makes them
    consistent with Xen's existing helpers, though is quite blunt. It is worth
    noting that Armv8-A AArch32 does support acq/rel semantics on exclusive
    accesses, with LDAEX and STLEX, so I could imagine us introducing a new
    arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A
    AArch32, then swap to lighterweight STLEX versions of these helpers rather
    than the heavyweight double DMB versions. Whether that would actually give
    measurable performance improvements is another story!



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.