[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Re: drbd 8 primary/primary and xen migration on RHEL 5

To: xen-users@xxxxxxxxxxxxxxxxxxx
From: Antibozo <xen-users@xxxxxxxxxxxx>
Date: Sun, 03 Aug 2008 21:55:39 +0000
Delivery-date: Sun, 03 Aug 2008 14:56:26 -0700
List-id: Xen user discussion <xen-users.lists.xensource.com>

On 2008-07-31 21:30, Antibozo wrote:

I've reviewed the list archives, particularly the posts from Zakk, onthis subject, and found results similar to his. drbd provides ablock-drbd script, but with full virtualization, at least on RHEL 5,this does not work; by the time the block script is run, the qemu-dm hasalready been started.

I've developed a workaround for all of this, in the form of a wrapperscript for qemu-dm. This is trickier than it might seem at first blush,because of the way that xend uses signals to communicate with qemu-dm.The wrapper script can be used in the "model =" line of a vm definition,and will take care of assuring consistency of the drbd resource(s) for avm during reboots, migration, etc.


The script can be found here:

http://www.antibozo.net/xen/qemu-dm.drbd

Strategy is detailed in script comments. Please review these if you wantdetails. The principle objective is prevention of split brain.

If you want to use Xen on top of drbd for high availability, this is adecent first cut, as far as I can tell. Feedback is welcome.

Instead I've been simply musing the possibility of keeping the drbddevices in primary/primary state at all times. I'm concerned about arace condition, however, and want to ask if others have examined thisalternative.

I've moved away from this strategy, and am keeping resources secondarywhen a vm isn't using them. This enables the remote node to tell if a vmis already running on a drbd resource by inspecting the peerprimary/secondary status (the wrapper script does this). This makes itdifficult, though not impossible, for you to accidentally fire up a vmusing a resource that is already in use by a vm on the remote node.

I've also discovered that primary/primary mode is not actually needed,at least for HVM vms using Xen 3.0.3 as shipped on RHEL 5. Theconventional wisdom was that primary/primary was necessary duringmigration, but with the appropriate wrapper around qemu-dm, we can waitfor the peer to go secondary before going primary on the local node.

One way you can still get yourself pretty hosed (if you're determined todo so) is the following:

- Start vm on node A. The wrapper makes the drbd resource primary, andthe vm starts running.- Start vm on node B. This creates the vm instance, but the wrapperblocks waiting for the drbd resource on node A to be secondary.- Start a migration from node A to B. This freaks xend out since italready has a vm with the same name running (even though it isn'tactually running yet).

In this scenario, you may end up having to reboot node B because the xenstore gets crufty. But you still should never end up with a split braincondition.

Obviously you could also get hosed if your nodes can't talk to oneanother, and you start the same vm on both nodes. This is classic splitbrain. In this case, drbd should refuse to resync when drbd connectivityis restored, and you'll have to kill one of the vm instances, invalidatethe local drbd resource, and resync, after which things should be fine.I haven't tested this scenario yet, so YMMV.

I am thinking of a scenario where the vm is running on node A, and has aprocess that is writing to disk at full speed, and consequently the drbddevice on the node B is lagging. If I perform a live migration from nodeA to B under this condition, the local device on node B might not be insync at the time the vm is started on that node. Maybe.

I have done some testing of heavy disk i/o situations during livemigration, and things appear to remain fully consistent. Note that thei/o stack of filesystem on top of LVM volume, on top of xen, on top ofdrbd, on top of LVM volume is not super fast. I see 10-20 MB/s with newblock allocation on a 4-core PowerEdge 1950 using SAS disks (with oneCPU allocated to the vm). So don't plan on that particular architecturefor your heavily used RDBMS.

If I use drbd protocol C, theoretically at least, a sync on the deviceon node A shouldn't return until node B is fully in sync. So I guess mymain question is: during migration, does xend force a device sync onnode A before the vm is started on node B?

By all appearances (empirically), yes. And since this qemu-dm wrapperalso waits for secondary state on the peer, and UpToDate state on thelocal copy, before actually invoking the real qemu-dm, I believe we arecovered.


--
Jefferson Ogata : Internetworker, Antibozo

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

Prev by Date: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
Next by Date: Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
Previous by thread: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
Next by thread: Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.