Xen project Mailing List

Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce

To: Tim Deegan <tim@xxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Mon, 30 Jun 2014 11:24:03 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Mon, 30 Jun 2014 10:24:15 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 2014-06-30 at 12:14 +0200, Tim Deegan wrote: > At 10:47 +0100 on 30 Jun (1404121679), David Vrabel wrote: > > Shared ring updates are strictly ordered with respect to the writes to > > data pages (either via grant map or grant copy). This means that is the > > guest sees a response in the ring it is guaranteed that all writes to > > the associated pages are also present. > > Is the ring update also strictly ordered wrt the grant unmap operation? Probably nothing actually enforces this and we've likely never written it down but I think in general a well behaved backend should be doing the unmap before indicating completion (modulo persistent grants). I think doing otherwise would cause the guest issues when it tried to reuse the grant slot, because the gref would potentially be marked busy. > > The write of the response and the write of the producer index are > > strictly ordered. If the backend is in the process of writing a > > response and the page is saved then the partial (corrupt) response is > > not visible to the guest. The write of the producer index is atomic so > > the saver cannot see a partial producer index write. > > Yes. The (suggested) problem is that live migration does not preserve > that write ordering. So we have to worry about something like this: Thank you for the clear explanation of the issue. > 1. Toolstack pauses the domain for the final pass. Reads the final > LGD bitmap, which happens to include the shared ring but not the > data pages. > 2. Backend writes the data. > 3. Backend unmaps the data page, marking it dirty. > 4. Backend writes the ring. > 5. Toolstack sends the ring page across in the last pass. > 6. Guest resumes, seeing the I/O marked as complete, but without the > data. > > ISTR working though this before and being convinced that the backends > were correctly detaching before the final pass. That was a long time > ago, though. It's certainly not impossible that the switch to xl or the move of the calls to the hotplug scripts from udev to libxl or something like that has caused this sequencing to become incorrect. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.