[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.



Paolo,

--On 18 March 2013 17:19:14 +0100 Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:

I remembered this incorrectly, sorry.  It's not from a previous run,
it's from the beginning of this run.  See
http://wiki.qemu.org/Migration/Storage for more information.

A VM has a disk backed by NFS. It runs on node A, at which point pages
are introduced to the page cache. It then migrates to node B, which
entails starting the VM on node B while it is still running on node A.
Closing has yet to happen on node A, but the file is already open on
node B; anything that is cached on node B will never be invalidated.

Thus, any changes done to the disk on node A during migration may not
become visible on node B.

This might be a difference between Xen and KVM. On Xen migration is
made to a server in a paused state, and it's only unpaused when
the migration to B is complete. There's a sort of extra handshake at
the end.

I believe what's happening is that libxl_domain_suspend when
called with LIBXL_SUSPEND_LIVE will do a final fsync()/fdatasync()
at the end, then await a migrate_receiver_ready message, and only
when that has been received will it send a migrate_permission_to_go
message which unpauses the domain. Before that, I don't believe the
disk is read (I may be wrong about that). The sending code is in
migrate_domain() in xl_cmdimpl.c, and the receiving code is in
migrate_receive() (same file). On xen at least, I don't think
the VM is ever started on node B whilst it is still running on node
A.

I've no problem if xl or libvirt or whatever error or warn. My usage
is API based, rather than xl / libvirt based.

What makes libvirt not an API (just like libxl)?

Nothing, just I'm using the QMP API and the libxl API. I'm just saying
whether libvirt or xl warn or error makes no difference to me.

If libxl does migration without O_DIRECT, then that's a bug in libxl.
What about blkback?  IIRC it uses bios, so it also bypasses the page
cache.

Possibly a bug in xl rather than libxl, but as no emulated devices
use O_DIRECT, that bug is already there, and isn't in QEMU.

blkback is the in-kernel PV device, it's not an emulated device.

I mean that an emulated device will already not use O_DIRECT.
So if you are right about live migrate being unsafe without O_DIRECT,
it's already unsafe for emulated devices.

Stefano did ack the patch, and for a one line change it's been
through a pretty extensive discussion on xen-devel ...

It may be a one-line change, but it completely changes the paths that
I/O goes through.  Apparently the discussion was not enough.

What would you suggest?

Nothing except fixing the bug in the kernel.

I have already posted patches for that, as Ian Campbell did in 2008,
but no one seems particularly interested. Be my guest in trying to
get them adopted. That's quite obviously the long term solution.

In the mean time, however, there is a need to run Xen on kernels with
long term support. Not being able to run Xen in a stable manner is
not an acceptable position.

No one has yet explained
why blkback is not susceptible to the same bug.

I would guess it will be if it uses O_DIRECT or whatever the in kernel
equivalent is, unless it's doing a copy of the guest pages prior to the
write being marked as complete.

I can't claim to be familiar with blkback, but I presume this would
require a similar fix elsewhere.

--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.