[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Re: drbd 8 primary/primary and xen migration on RHEL 5



On 2008-07-31 21:30, Antibozo wrote:
I've reviewed the list archives, particularly the posts from Zakk, on this subject, and found results similar to his. drbd provides a block-drbd script, but with full virtualization, at least on RHEL 5, this does not work; by the time the block script is run, the qemu-dm has already been started.

I've developed a workaround for all of this, in the form of a wrapper script for qemu-dm. This is trickier than it might seem at first blush, because of the way that xend uses signals to communicate with qemu-dm. The wrapper script can be used in the "model =" line of a vm definition, and will take care of assuring consistency of the drbd resource(s) for a vm during reboots, migration, etc.

The script can be found here:

http://www.antibozo.net/xen/qemu-dm.drbd

Strategy is detailed in script comments. Please review these if you want details. The principle objective is prevention of split brain.

If you want to use Xen on top of drbd for high availability, this is a decent first cut, as far as I can tell. Feedback is welcome.

Instead I've been simply musing the possibility of keeping the drbd devices in primary/primary state at all times. I'm concerned about a race condition, however, and want to ask if others have examined this alternative.

I've moved away from this strategy, and am keeping resources secondary when a vm isn't using them. This enables the remote node to tell if a vm is already running on a drbd resource by inspecting the peer primary/secondary status (the wrapper script does this). This makes it difficult, though not impossible, for you to accidentally fire up a vm using a resource that is already in use by a vm on the remote node.

I've also discovered that primary/primary mode is not actually needed, at least for HVM vms using Xen 3.0.3 as shipped on RHEL 5. The conventional wisdom was that primary/primary was necessary during migration, but with the appropriate wrapper around qemu-dm, we can wait for the peer to go secondary before going primary on the local node.

One way you can still get yourself pretty hosed (if you're determined to do so) is the following:

- Start vm on node A. The wrapper makes the drbd resource primary, and the vm starts running. - Start vm on node B. This creates the vm instance, but the wrapper blocks waiting for the drbd resource on node A to be secondary. - Start a migration from node A to B. This freaks xend out since it already has a vm with the same name running (even though it isn't actually running yet).

In this scenario, you may end up having to reboot node B because the xen store gets crufty. But you still should never end up with a split brain condition.

Obviously you could also get hosed if your nodes can't talk to one another, and you start the same vm on both nodes. This is classic split brain. In this case, drbd should refuse to resync when drbd connectivity is restored, and you'll have to kill one of the vm instances, invalidate the local drbd resource, and resync, after which things should be fine. I haven't tested this scenario yet, so YMMV.

I am thinking of a scenario where the vm is running on node A, and has a process that is writing to disk at full speed, and consequently the drbd device on the node B is lagging. If I perform a live migration from node A to B under this condition, the local device on node B might not be in sync at the time the vm is started on that node. Maybe.

I have done some testing of heavy disk i/o situations during live migration, and things appear to remain fully consistent. Note that the i/o stack of filesystem on top of LVM volume, on top of xen, on top of drbd, on top of LVM volume is not super fast. I see 10-20 MB/s with new block allocation on a 4-core PowerEdge 1950 using SAS disks (with one CPU allocated to the vm). So don't plan on that particular architecture for your heavily used RDBMS.

If I use drbd protocol C, theoretically at least, a sync on the device on node A shouldn't return until node B is fully in sync. So I guess my main question is: during migration, does xend force a device sync on node A before the vm is started on node B?

By all appearances (empirically), yes. And since this qemu-dm wrapper also waits for secondary state on the peer, and UpToDate state on the local copy, before actually invoking the real qemu-dm, I believe we are covered.

--
Jefferson Ogata : Internetworker, Antibozo

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.