Xen project Mailing List

Re: [Xen-devel] [PATCH 2/4] xen/x86: Drop erronious barriers

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 7 Dec 2016 01:03:23 +0000

Cc: Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 07 Dec 2016 01:03:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 06/12/2016 20:32, Stefano Stabellini wrote: > On Tue, 6 Dec 2016, Stefano Stabellini wrote: >> On Tue, 6 Dec 2016, Andrew Cooper wrote: >>> On 05/12/2016 19:17, Stefano Stabellini wrote: >>>> On Mon, 5 Dec 2016, Andrew Cooper wrote: >>>>> None of these barriers serve any purpose, as they are not synchronising >>>>> with >>>>> any anything on remote CPUs. >>>>> >>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>>> --- >>>>> CC: Jan Beulich <JBeulich@xxxxxxxx> >>>>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> >>>>> CC: Julien Grall <julien.grall@xxxxxxx> >>>>> >>>>> Restricting to just the $ARCH maintainers, as this is a project-wide >>>>> sweep. >>>>> >>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier >>>>> usage >>>>> from x86, but I don't know whether further development has gained a >>>>> dependence >>>>> on them. >>>> We turned them into smp_wmb already (kudos to IanC). >>> Right, but the entire point I am trying to argue is that they are not >>> needed in the first place. > Just to be clear, on ARM the barriers are unneeded only if it is > unimportant that "init stuff" (which correspond to all the > initialization done in start_secondary up to smp_wmb) below is completed > before "write cpu_online_map". But it looks like we do want to complete > mmu, irq, timer initializations and set the current vcpu before marking > the cpu as online, right? No. I am sorry, but this question suggests that you still don't appear to understand barriers. > > >> This is the current code: >> >> CPU 1 CPU 0 >> ----- ----- >> >> init stuff read cpu_online_map >> >> write barrier >> >> write cpu_online_map do more initialization >> >> write barrier >> >> init more stuff >> >> >> I agree that it's wrong, because the second write barrier in >> start_secondary is useless and in addition we are missing a read barrier >> in __cpu_up, corresponding to the first write barrier in >> start_secondary. >> >> I think it should look like: >> >> >> CPU 1 CPU 0 >> ----- ----- >> >> init stuff read cpu_online_map >> >> write barrier read barrier >> >> write cpu_online_map do more initialization >> >> init more stuff The barriers here serve no purpose, because you have missed an important blocking while loop on CPU 0. Recall, that the read/write barrier example is: foo = a; smp_rmb(); bar = b; and a = baz; smp_wmb(); b = fromble; This is specifically relevant *only* to the shared variables a and b, where for correctness an update to a must be observed remotely before the update to b. If you do not have the explicitly same a and b on either side of the smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't be using the barriers). The init sequence is a different scenario. Processor 0 spins waiting to observe an update to cpu_online_map. Processor 1 performs its init sequence, and mid way through, sets its own bit in the cpu_online_map. It then continues further init actions. It does not matter whether processor 0 observes the update to cpu_online_map slightly early or late compared to the local-state updates from the other parts of processor 1's init sequence (because processor 0 had better not be looking at the local-state changes). Once the bit is set in cpu_online_map, processor 1 is ready to be IPI'd to give it work to do, etc. Even if processor 0 hasn't observed all of the local-state updates of processor 1, once it has seen the cpu_online_map update, it knows that processor 1 is available to be IPI'd, and IPIing processor 1 will cause execution with expected program order being observed (from the point of view of processor 1). The only consideration is whether processor 1 can make an action which will prioritise the update to cpu_online_map becoming visible to the rest of the system. If there is such an action available, then an argument can be made that making the update visible more quickly will allow processor 0 to continue earlier. However, such an action would be entirely local to processor 1 and not related to anything happening on processor 0. I am not aware of any such action on x86, and my gut feeling is that no other architecture would have one either. The ability to fast forward one specific update to the shared cache-coherency layer would only complicate an already complicated area of silicon, and would only be useful to micro-optimise a corner case; I can't see any system designers considering it a useful feature to add. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.