Xen project Mailing List

Re: [Xen-devel] Migration filesystem coherency?

To: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>, "John Byrne" <john.l.byrne@xxxxxx>

From: "Charles Coffing" <ccoffing@xxxxxxxxxx>

Date: Wed, 28 Jun 2006 10:34:48 -0600

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 28 Jun 2006 09:35:21 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Tue, Jun 27, 2006 at 4:08 PM, in message <44A1AC41.3030600@xxxxxx>, John Byrne <john.l.byrne@xxxxxx> wrote: >>> I thought I had a workaround for live migration crashing >>> (I've been looking at the SLES 3.0.2 9742c code.), but I >>> found that I was getting filesystem errors. I'm wondering if >>> the problem is races in data being written to the backing storage. >>> >>> When migrating a domain, before the domain is started on the >>> new host, you have to guarantee that all the domU vbd data is >>> out of the block cache and written to the backing device. (In >>> the case of a loopback device, whether this is sufficient >>> depends on the cross- host coherency guarantees of the backing >>> filesystem.) I cannot see that this takes place synchronously >>> with the migration process. To me it looks like that the >>> teardown/flush of the backing device depends on the action of >>> the xenbus and the hotplug scripts and looks asynchronous to >>> the migration process. I'm seeing this too, but in a slightly different context. > As to loopback, regardless of what kind of I/O it does, when the > loopback device is torn down, all I/O should be committed to, at least, > the VFS layer of the backing filesystem. If the backing filesystem makes > the proper coherency guarantees, then this should be sufficient. My > understanding is that both GFS and OCFS2 make these guarantees. So with > these filesystems as the backing store, as long as Xen can guarantee the > tear down before the domain starts executing on the new node, things > should work, shouldn't they? John, I haven't looked at the migration case, but the problem you're describing does sound very similar to Novell's bugzilla #185557. In this case, try doing a "xm shutdown -w" (and once that returns, immediately start the VM on another physical node). The shutdown should wait until the domain is completely shut down (and flushed, one would hope) before returning. It doesn't... the udev event that tears down the loopback device hasn't necessarily happened before the command returns, and so we've been seeing filesystem corruption when the VM is brought back up on another node. We're on OCFS2, so I, too, think that ensuring the loopback is torn down synchronously would be sufficient to fix this problem. -Charles _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.