[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce



On Mon, 2014-06-30 at 10:47 +0100, David Vrabel wrote:
> On 30/06/14 09:38, Ian Campbell wrote:
> > On Fri, 2014-06-27 at 17:51 +0100, Andrew Cooper wrote:
> >> Hello,
> >>
> >> After a long time fixing my own memory corruption bugs with migration
> >> v2, I have finally tracked down (what I really really hope is) the last
> >> of the corruption.
> >>
> >>
> >> There appears to be a systematic problem affecting all PV drivers,
> >> whereby a non-quiescent backend can cause memory corruption in the VM.
> >>
> >> Active grant mapped pages are only reflected in the dirty bitmap after
> >> the grant has been unmapped, as mapping the ring read-only would be
> >> catastrophic to performance, and remapping as read-only when logdirty is
> >> enabled is (as far as I understand) impossible, as Xen doesn't track the
> >> PTEs pointing at granted frames.
> >>
> >> PV backend drivers hold their mappings of the rings (and persistently
> >> granted frames) open until the domain is destroyed, which is after the
> >> memory image has been sent.  Therefore, any requests which are processed
> >> after the migration code sending the ring frame on its first pass will
> >> not be reflected in the resumed domain, as this frame will never be
> >> marked as dirty in Xen.
> >>
> >> Furthermore, as the migration code uses memcpy() on the frames, it is
> >> possible that a backed update intersects with the copy, and a corrupt
> >> descriptor appears on the resumed side.
> >>
> >> In addition, after the domain has been paused, the backend might still
> >> process requests.  The migration code excepts the guest be completely
> >> quiesced after it has been suspended, so will only check the dirty
> >> bitmap once.  Any requests which get processed and completed might still
> >> be missed by the migration code.
> >>
> >> From a heavily instrumented Xen and migration code, I am fairly sure I
> >> have confirmed that all pages corrupted on migration are a result of
> >> still-active grant maps, grant copies which complete after domain
> >> suspend, or the xenstore ring which xenstored has a magic mapping of,
> >> and will never be reflected in the dirty bitmap.
> >>
> >>
> >> Overall, it would appear that there needs to be a hook for all PV
> >> drivers to force quiescence.  In particular, a backend must guarantee to
> >> unmap all active grant maps (so the frames get properly reflected in the
> >> dirty bitmap), and never process subsequent requests (so no new frames
> >> appear dirty in the bitmap after the guest has been paused).
> >>
> >> Thoughts/comments?
> > 
> > I thought PV drivers were already (supposed to be) handling this in the
> > frontend.
> > 
> > For reasons of checkpoint performance I think Linux's net and blkfront
> > are handling this on resume rather than on suspend by tearing down on
> > resume and then requeueing any outstanding I/O after they reattach to
> > the new backend. In the blkfront case this is explicit, whereas IIRC
> > netfront just discards any active requests and relies on L3+
> > retransmition to get the job done. (see netfront_resume and
> > blkfront_resume/blkif_recover).
> > 
> > Part of the tear down and reconnect should involve invalidating any
> > inflight descriptors, whether or not they were partially completed or
> > have corrupted replies in them etc. This ought to be happening before
> > the new backend sees the ring at all.
> > 
> > Can you give an example of an instance of the corruption which you've
> > seen?
> > 
> > Do you have a theory why this wasn't seen with the migration v1 code?
> 
> I think it does not result in any problems (except if persistent grants
> are used, see below).
> 
> Shared ring updates are strictly ordered with respect to the writes to
> data pages (either via grant map or grant copy).  This means that is the
> guest sees a response in the ring it is guaranteed that all writes to
> the associated pages are also present.
> 
> The write of the response and the write of the producer index are
> strictly ordered.  If the backend is in the process of writing a
> response and the page is saved then the partial (corrupt) response is
> not visible to the guest.  The write of the producer index is atomic so
> the saver cannot see a partial producer index write.

Exactly.

> Using persistent grants means that backend writes may be lost since the
> memcpy in the backend does not update the dirty bitmap and guest writes
> to the shared ring may make the backends response writes visible.  I
> think we may need to disable persistent grant support from blkback until
> a mechanism for updating the dirty log is in place.

Yes, sadly I think you might be right here.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.