[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory ordering question in the shutdown deferral code

To: Julien Grall <julien@xxxxxxx>
From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Date: Wed, 23 Sep 2020 15:57:21 -0700 (PDT)
Cc: "Durrant, Paul" <pdurrant@xxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, "Xia, Hongyan" <hongyxia@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 23 Sep 2020 22:57:45 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, 21 Sep 2020, Julien Grall wrote:
> On 21/09/2020 13:55, Durrant, Paul wrote:
> > > (+ Xen-devel)
> > > 
> > > Sorry I forgot to CC xen-devel.
> > > 
> > > On 21/09/2020 12:38, Julien Grall wrote:
> > > > Hi all,
> > > > 
> > > > I have started to look at the deferral code (see
> > > > vcpu_start_shutdown_deferral()) because we need it for LiveUpdate and
> > > > Arm will soon use it.
> > > > 
> > > > The current implementation is using an smp_mb() to ensure ordering
> > > > between a write then a read. The code looks roughly (I have slightly
> > > > adapted it to make my question more obvious):
> > > > 
> > > > domain_shutdown()
> > > >       d->is_shutting_down = 1;
> > > >       smp_mb();
> > > >       if ( !vcpu0->defer_shutdown )
> > > >       {
> > > >         vcpu_pause_nosync(v);
> > > >         v->paused_for_shutdown = 1;
> > > >       }
> > > > 
> > > > vcpu_start_shutdown_deferral()
> > > >       vcpu0->defer_shutdown = 1;
> > > >       smp_mb();
> > > >       if ( unlikely(d->is_shutting_down) )
> > > >         vcpu_check_shutdown(v);
> > > > 
> > > >       return vcpu0->defer_shutdown;
> > > > 
> > > > smp_mb() should only guarantee ordering (this may be stronger on some
> > > > arch), so I think there is a race between the two functions.
> > > > 
> > > > It would be possible to pause the vCPU in domain_shutdown() because
> > > > vcpu0->defer_shutdown wasn't yet seen.
> > > > 
> > > > Equally, vcpu_start_shutdown_deferral() may not see d->is_shutting_down
> > > > and therefore Xen may continue to send the I/O. Yet the vCPU will be
> > > > paused so the I/O will never complete.
> > > > 
> > 
> > The barrier enforces global order, right?
> 
> It is not clear to me what you mean by "global ordering". This seems to
> suggest a very expensive synchronization barrier between all the processors.
> 
> From an arch-agnostic PoV, smp_mb() will enforce an ordering between
> loads/stores but it doesn't guarantee *when* they will be observed.
> 
> > So, if domain_shutdown() pauses the vcpu then is_shutting_down must
> > necessarily be visible all cpus.
> 
> That's not the guarantee provided by smp_mb() (see above).

I simplified the code further to help us reason about it:

   thread#1 |  thread#2
            |
1) WRITE A  |  WRITE B
2) BARRIER  |  BARRIER
3) READ B   |  READ A

I think it is (theoretically) possible for thread#1 to be at 1) and
about to do 2), while thread#2 goes ahead and does 1) 2) 3). By the time
thread#1 does 2), thread#2 has already completed the entire sequence.

If thread#2 has already done 2), and thread#1 is about to do 3), then I
think we are guaranteed that thread#1 will see the new value of B. Or
is this the core of the issue we are discussing?

If it works the way I wrote, then it would confirm Paul's view.

For your information I went to check what the Linux memory model has to
say about this. It says:

"smp_mb() guarantees to restore sequential consistency among accesses that use 
READ_ONCE, WRITE_ONCE(), or stronger. For example, the following Linux-kernel 
code would forbid non-SC outcomes"

It is interesting that they chose the words "restore sequential
consistency". It would be difficult to come up with an example that has
"sequential consistency" but doesn't work the way described earlier.

Follow-Ups:
- Re: Memory ordering question in the shutdown deferral code
  - From: Julien Grall

References:
- Re: Memory ordering question in the shutdown deferral code
  - From: Julien Grall
- RE: Memory ordering question in the shutdown deferral code
  - From: Durrant, Paul
- Re: Memory ordering question in the shutdown deferral code
  - From: Julien Grall

Prev by Date: Re: [PATCH v4 3/4] xen: Remove mfn_to_gmfn macro
Next by Date: [xen-4.13-testing test] 154625: regressions - FAIL
Previous by thread: Re: Memory ordering question in the shutdown deferral code
Next by thread: Re: Memory ordering question in the shutdown deferral code
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.