[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce
On Mon, 2014-06-30 at 12:57 +0100, David Vrabel wrote: > On 30/06/14 12:07, Ian Campbell wrote: > > On Mon, 2014-06-30 at 12:52 +0200, Tim Deegan wrote: > >> At 12:14 +0200 on 30 Jun (1404126862), Tim Deegan wrote: > >>> At 10:47 +0100 on 30 Jun (1404121679), David Vrabel wrote: > >>>> Shared ring updates are strictly ordered with respect to the writes to > >>>> data pages (either via grant map or grant copy). This means that is the > >>>> guest sees a response in the ring it is guaranteed that all writes to > >>>> the associated pages are also present. > >>> > >>> Is the ring update also strictly ordered wrt the grant unmap operation? > >>> > >>>> The write of the response and the write of the producer index are > >>>> strictly ordered. If the backend is in the process of writing a > >>>> response and the page is saved then the partial (corrupt) response is > >>>> not visible to the guest. The write of the producer index is atomic so > >>>> the saver cannot see a partial producer index write. > >>> > >>> Yes. The (suggested) problem is that live migration does not preserve > >>> that write ordering. So we have to worry about something like this: > >>> > >>> 1. Toolstack pauses the domain for the final pass. Reads the final > >>> LGD bitmap, which happens to include the shared ring but not the > >>> data pages. > >>> 2. Backend writes the data. > >>> 3. Backend unmaps the data page, marking it dirty. > >>> 4. Backend writes the ring. > >>> 5. Toolstack sends the ring page across in the last pass. > >>> 6. Guest resumes, seeing the I/O marked as complete, but without the > >>> data. > >> > >> It occurs to me that the guest should be able to defend against this > >> by taking a local copy of the response producer before migration and > >> using _that_ for the replay logic afterwards. That is guaranteed to > >> exclude any I/O that completed after the VM was paused, and as long as > >> the unmap is guaranteed to happen before the ring update, we're OK. > > > > AIUI blkfront at least maintains it's own shadow copy of the ring at all > > times, and the recovery process doesn't use the migrated copy of the > > ring at all (at least not the responses). I might be misunderstanding > > the code there though. > > Yes, this is what it looks like to me. netfront is also safe since it > just discards everything and doesn't replay at all. > > This does rather feel like we're discovering problems that were > identified (and fixed) a long time ago. > > I think there needs to be better documentation of front/backend drivers > so people know how to write them correctly. Yes :-/ > I'll try to carve out some > time for this if no one else volunteers... Thanks, that would be wonderful as always. (Maybe add a TODO to the doc day page too?) > Perhaps all we need is a clear statement of: The contents of shared ring > are undefined on resume. A frontend driver must not use the shared ring > to replay any requests. I *think* that should be sufficient. > > >> (That still leaves the question that Andrew raised of memcpy() > >> breaking atomicity/ordering of updates.) > > > > That's the memcpy in the migration code vs the definitely correctly > > ordered updates done by the b.e., right? > > But if the frontend doesn't use the shared ring for replay, it doesn't > matter if this memcpy is correctly ordered or not? That's my understanding, yes. > > David _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |