Xen project Mailing List

Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Tim Deegan <tim@xxxxxxx>

Date: Mon, 30 Jun 2014 13:12:30 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Mon, 30 Jun 2014 11:12:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

At 12:07 +0100 on 30 Jun (1404126466), Ian Campbell wrote: > On Mon, 2014-06-30 at 12:52 +0200, Tim Deegan wrote: > > At 12:14 +0200 on 30 Jun (1404126862), Tim Deegan wrote: > > > At 10:47 +0100 on 30 Jun (1404121679), David Vrabel wrote: > > > > Shared ring updates are strictly ordered with respect to the writes to > > > > data pages (either via grant map or grant copy). This means that is the > > > > guest sees a response in the ring it is guaranteed that all writes to > > > > the associated pages are also present. > > > > > > Is the ring update also strictly ordered wrt the grant unmap operation? > > > > > > > The write of the response and the write of the producer index are > > > > strictly ordered. If the backend is in the process of writing a > > > > response and the page is saved then the partial (corrupt) response is > > > > not visible to the guest. The write of the producer index is atomic so > > > > the saver cannot see a partial producer index write. > > > > > > Yes. The (suggested) problem is that live migration does not preserve > > > that write ordering. So we have to worry about something like this: > > > > > > 1. Toolstack pauses the domain for the final pass. Reads the final > > > LGD bitmap, which happens to include the shared ring but not the > > > data pages. > > > 2. Backend writes the data. > > > 3. Backend unmaps the data page, marking it dirty. > > > 4. Backend writes the ring. > > > 5. Toolstack sends the ring page across in the last pass. > > > 6. Guest resumes, seeing the I/O marked as complete, but without the > > > data. > > > > It occurs to me that the guest should be able to defend against this > > by taking a local copy of the response producer before migration and > > using _that_ for the replay logic afterwards. That is guaranteed to > > exclude any I/O that completed after the VM was paused, and as long as > > the unmap is guaranteed to happen before the ring update, we're OK. > > AIUI blkfront at least maintains it's own shadow copy of the ring at all > times, and the recovery process doesn't use the migrated copy of the > ring at all (at least not the responses). I might be misunderstanding > the code there though. That sounds like it shoud be OK then, though we might want to document exactly why. :) > > (That still leaves the question that Andrew raised of memcpy() > > breaking atomicity/ordering of updates.) > > That's the memcpy in the migration code vs the definitely correctly > ordered updates done by the b.e., right? Yes. That's the risk that - the memcpy might use non-atomic reads and so corrupt the rsp-prod; or - the memcpy might be ordered s.t. that it copies the ring itself before it copies rsp-prod, which breaks the required ordering on the frontend. I think that on x86 we're unlikely to have either of those problems in practice, at least for single-page rings. Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.