Xen project Mailing List

Re: [Xen-devel] [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics

To: Ian Campbell <ian.campbell@xxxxxxxxxx>

From: Julien Grall <julien.grall@xxxxxxxxxx>

Date: Thu, 20 Mar 2014 17:43:59 +0000

Cc: stefano.stabellini@xxxxxxxxxxxxx, tim@xxxxxxx, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 20 Mar 2014 17:44:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/20/2014 03:45 PM, Ian Campbell wrote: > Xen, like Linux, expects full barrier semantics for bitops, atomics and > cmpxchgs. This issue was discovered on Linux and we get our implementation of > these from Linux so quoting Will Deacon in Linux commit 8e86f0b409a4 for the > gory details: > Linux requires a number of atomic operations to provide full barrier > semantics, that is no memory accesses after the operation can be > observed before any accesses up to and including the operation in > program order. > > On arm64, these operations have been incorrectly implemented as follows: > > // A, B, C are independent memory locations > > <Access [A]> > > // atomic_op (B) > 1: ldaxr x0, [B] // Exclusive load with acquire > <op(B)> > stlxr w1, x0, [B] // Exclusive store with release > cbnz w1, 1b > > <Access [C]> > > The assumption here being that two half barriers are equivalent to a > full barrier, so the only permitted ordering would be A -> B -> C > (where B is the atomic operation involving both a load and a store). > > Unfortunately, this is not the case by the letter of the architecture > and, in fact, the accesses to A and C are permitted to pass their > nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs > or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the > store-release on B). This is a clear violation of the full barrier > requirement. > > The simple way to fix this is to implement the same algorithm as ARMv7 > using explicit barriers: > > <Access [A]> > > // atomic_op (B) > dmb ish // Full barrier > 1: ldxr x0, [B] // Exclusive load > <op(B)> > stxr w1, x0, [B] // Exclusive store > cbnz w1, 1b > dmb ish // Full barrier > > <Access [C]> > > but this has the undesirable effect of introducing *two* full barrier > instructions. A better approach is actually the following, non-intuitive > sequence: > > <Access [A]> > > // atomic_op (B) > 1: ldxr x0, [B] // Exclusive load > <op(B)> > stlxr w1, x0, [B] // Exclusive store with release > cbnz w1, 1b > dmb ish // Full barrier > > <Access [C]> > > The simple observations here are: > > - The dmb ensures that no subsequent accesses (e.g. the access to C) > can enter or pass the atomic sequence. > > - The dmb also ensures that no prior accesses (e.g. the access to A) > can pass the atomic sequence. > > - Therefore, no prior access can pass a subsequent access, or > vice-versa (i.e. A is strictly ordered before C). > > - The stlxr ensures that no prior access can pass the store component > of the atomic operation. > > The only tricky part remaining is the ordering between the ldxr and the > access to A, since the absence of the first dmb means that we're now > permitting re-ordering between the ldxr and any prior accesses. > > From an (arbitrary) observer's point of view, there are two scenarios: > > 1. We have observed the ldxr. This means that if we perform a store to > [B], the ldxr will still return older data. If we can observe the > ldxr, then we can potentially observe the permitted re-ordering > with the access to A, which is clearly an issue when compared to > the dmb variant of the code. Thankfully, the exclusive monitor will > save us here since it will be cleared as a result of the store and > the ldxr will retry. Notice that any use of a later memory > observation to imply observation of the ldxr will also imply > observation of the access to A, since the stlxr/dmb ensure strict > ordering. > > 2. We have not observed the ldxr. This means we can perform a store > and influence the later ldxr. However, that doesn't actually tell > us anything about the access to [A], so we've not lost anything > here either when compared to the dmb variant. > > This patch implements this solution for our barriered atomic operations, > ensuring that we satisfy the full barrier requirements where they are > needed. > > Cc: <stable@xxxxxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Signed-off-by: Will Deacon <will.deacon@xxxxxxx> > Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx> > > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> Acked-by: Julien Grall <julien.grall@xxxxxxxxxx> -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.