[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen
Hi Julien, Thanks for taking a look at the patches and providing feedback. I've seen your other comments and will reply to those separately when I get a chance (maybe at the weekend or over the Christmas break). RE the differences in ordering semantics between Xen's and Linux's atomics helpers, please find my notes below. Thoughts? Cheers, Ash. The tables below use format AAA/BBB/CCC/DDD/EEE, where: - AAA is the memory barrier before the operation - BBB is the acquire semantics of the atomic operation - CCC is the release semantics of the atomic operation - DDD is whether the asm() block clobbers memory - EEE is the memory barrier after the operation For example, ---/---/rel/mem/dmb would mean: - No memory barrier before the operation - The atomic does *not* have acquire semantics - The atomic *does* have release semantics - The asm() block clobbers memory - There is a DMB memory barrier after the atomic operation arm64 LL/SC =========== Xen Function Xen Linux Inconsistent ============ === ===== ============ atomic_add ---/---/---/---/--- ---/---/---/---/--- --- atomic_add_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1) atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- atomic_sub_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1) atomic_and ---/---/---/---/--- ---/---/---/---/--- --- atomic_cmpxchg dmb/---/---/---/dmb ---/---/rel/mem/--- YES (2) atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/dmb YES (3) (1) It's actually interesting to me that Linux does it this way. As with the LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB. Unless I'm missing something where there is a concern around taking an IRQ between the LDAXR and the STLXR, which can't happen in the LSE atomic case since it's a single instruction. But the exclusive monitor is cleared on exception return in AArch64 so I'm struggling to see what that potential issue may be. Regardless, Linux and Xen are consistent so we're OK ;-) (2) The Linux version uses either STLXR with rel semantics if the comparison passes, or DMB if the comparison fails. This is weaker than Xen's version, which is quite blunt in always wrapping the operation between two DMBs. This may be a holdover from Xen's arm32 versions being ported to arm64, as we didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless, this is quite a big discrepancy and I've not yet given it enough thought to determine whether it would actually cause an issue. My feeling is that the Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but like you said these helpers are well tested so I'd be surprised if there is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does* have acq semantics. (3) The Linux version just adds acq semantics to the LL, so we're OK here. arm64 LSE (comparison to Xen's LL/SC) ===================================== Xen Function Xen Linux Inconsistent ============ === ===== ============ atomic_add ---/---/---/---/--- ---/---/---/---/--- --- atomic_add_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- atomic_sub_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) atomic_and ---/---/---/---/--- ---/---/---/---/--- --- atomic_cmpxchg dmb/---/---/---/dmb ---/acq/rel/mem/--- YES (5) atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) (4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to work too. I don't think this discrepancy will cause any issues. (5) As with (2) above, this is quite a big discrepancy to Xen. However at least this version has acq semantics unlike the LL/SC version in (2), so I'm more confident that there won't be regressions going from Xen LL/SC to Linux LSE version of atomic_cmpxchg(). arm32 LL/SC =========== Xen Function Xen Linux Inconsistent ============ === ===== ============ atomic_add ---/---/---/---/--- ---/---/---/---/--- --- atomic_add_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- atomic_sub_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) atomic_and ---/---/---/---/--- ---/---/---/---/--- --- atomic_cmpxchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) atomic_xchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) (6) Linux only provides relaxed variants of these functions, such as atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14 in the series add the stricter versions expected by Xen, wrapping calls to Linux's relaxed variants inbetween two calls to smb_mb(). This makes them consistent with Xen's existing helpers, though is quite blunt. It is worth noting that Armv8-A AArch32 does support acq/rel semantics on exclusive accesses, with LDAEX and STLEX, so I could imagine us introducing a new arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A AArch32, then swap to lighterweight STLEX versions of these helpers rather than the heavyweight double DMB versions. Whether that would actually give measurable performance improvements is another story!
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |