[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce



At 12:07 +0100 on 30 Jun (1404126466), Ian Campbell wrote:
> On Mon, 2014-06-30 at 12:52 +0200, Tim Deegan wrote:
> > At 12:14 +0200 on 30 Jun (1404126862), Tim Deegan wrote:
> > > At 10:47 +0100 on 30 Jun (1404121679), David Vrabel wrote:
> > > > Shared ring updates are strictly ordered with respect to the writes to
> > > > data pages (either via grant map or grant copy).  This means that is the
> > > > guest sees a response in the ring it is guaranteed that all writes to
> > > > the associated pages are also present.
> > > 
> > > Is the ring update also strictly ordered wrt the grant unmap operation?
> > > 
> > > > The write of the response and the write of the producer index are
> > > > strictly ordered.  If the backend is in the process of writing a
> > > > response and the page is saved then the partial (corrupt) response is
> > > > not visible to the guest.  The write of the producer index is atomic so
> > > > the saver cannot see a partial producer index write.
> > > 
> > > Yes.  The (suggested) problem is that live migration does not preserve
> > > that write ordering.  So we have to worry about something like this:
> > > 
> > > 1. Toolstack pauses the domain for the final pass.  Reads the final
> > >    LGD bitmap, which happens to include the shared ring but not the
> > >    data pages.
> > > 2. Backend writes the data.
> > > 3. Backend unmaps the data page, marking it dirty.
> > > 4. Backend writes the ring.
> > > 5. Toolstack sends the ring page across in the last pass.
> > > 6. Guest resumes, seeing the I/O marked as complete, but without the
> > >    data.
> > 
> > It occurs to me that the guest should be able to defend against this
> > by taking a local copy of the response producer before migration and
> > using _that_ for the replay logic afterwards.  That is guaranteed to
> > exclude any I/O that completed after the VM was paused, and as long as
> > the unmap is guaranteed to happen before the ring update, we're OK.
> 
> AIUI blkfront at least maintains it's own shadow copy of the ring at all
> times, and the recovery process doesn't use the migrated copy of the
> ring at all (at least not the responses). I might be misunderstanding
> the code there though.

That sounds like it shoud be OK then, though we might want to document
exactly why. :)

> > (That still leaves the question that Andrew raised of memcpy()
> > breaking atomicity/ordering of updates.)
> 
> That's the memcpy in the migration code vs the definitely correctly
> ordered updates done by the b.e., right?

Yes.  That's the risk that
 - the memcpy might use non-atomic reads and so corrupt the rsp-prod; or
 - the memcpy might be ordered s.t. that it copies the ring itself
   before it copies rsp-prod, which breaks the required ordering on
   the frontend.

I think that on x86 we're unlikely to have either of those problems
in practice, at least for single-page rings.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.