[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
On Thu, 25 Feb 2021, Julien Grall wrote: > On 23/02/2021 13:24, Ash Wilding wrote: > > Hi Julien, > > Hi Ash, > > > Thanks for looking at this, > > > > > vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the > > > CPU to read any information about local events before the flag > > > _VPF_blocked is set. > > > > Reviewed-by: Ash Wilding <ash.j.wilding@xxxxxxxxx> > > Thanks! > > > > > > > As an aside, > > > > > I couldn't convince myself whether the Arm implementation of > > > local_events_need_delivery() contains enough barrier to prevent the > > > re-ordering. However, I don't think we want to play with devil here > > > as the function may be optimized in the future. > > > > Agreed. > > > > The vgic_vcpu_pending_irq() and vgic_evtchn_irq_pending() in the call > > path of local_events_need_delivery() both call spin_lock_irqsave(), > > which has an arch_lock_acquire_barrier() in its call path. > > > > That just happens to map to a heavier smp_mb() on Arm right now, but > > relying on this behaviour would be shaky; I can imagine a future update > > to arch_lock_acquire_barrier() that relaxes it down to just acquire > > semantics like its name implies (for example an LSE-based lock_acquire() > > using LDUMAXA),in which case any code incorrectly relying on that full > > barrier behaviour may break. I'm guessing this is what you meant by the > > function may be optimized in future? > > That's one of the optimization I had in mind. The other one is we may find a > way to remove the spinlocks, so the barriers would disappear completely. > > > > > Do we know whether there is an expectation for previous loads/stores > > to have been observed before local_events_need_delivery()? I'm wondering > > whether it would make sense to have an smb_mb() at the start of the > > *_nomask() variant in local_events_need_delivery()'s call path. > > That's a good question :). For Arm, there are 4 users of > local_events_need_delivery(): > 1) do_poll() > 2) vcpu_block() > 3) hypercall_preempt_check() > 4) general_preempt_check() > > 3 and 4 are used for breaking down long running operations. I guess we would > want to have an accurate view of the pending events and therefore we would > need a memory barrier to prevent the loads happening too early. > > In this case, I think the smp_mb() would want to be part of the > hypercall_preempt_check() and general_preempt_check(). > > Therefore, I think we want to avoid the extra barrier in > local_events_need_delivery(). Instead, we should require the caller to take > care of the ordering if needed. > > This would have benefits any new architecture as the common code would already > contain the appropriate barriers. > > @Stefano, what do you think? I am thinking the same way as you. Also because it is cleaner if it is the one who writes that also takes care of any barriers/flushes needed. In this case it is vcpu_block that is writing _VPF_blocked and knows that it has to be seen before local_events_need_delivery(). It is easier to keep track of it if the barrier is in vcpu_block together with the set_bit call.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |