[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] xcp SR and drbd



Hi Denis,

I'm wondering if it could be possible to extend lvm SR to integrate a DRBD primary/secondary redundancy for a two nodes installation.

Let me explains the context and the a possible solution.

we've been using drbd for some years on iscsi SAN with Xen/XCP, having 2 redundant iscsi SAN and 2 Xen/XCP nodes.

This kind of setup works great, however for smaller setups it is kind of overkill.

We used to do exactly the same, using two switches and multi-path to deal with the switch redundancy. Feeling it was overkill, we switched to a similar setup to you..

So we started to integrate drbd directly on XCP nodes (there are some docs at linbits about it). However being a little bit paranoid about split brain scenario, we have always been using primary/secondary setups (SR1 primary on XCP1 and SR2 primary on XCP2).

We are using a similar setup, but with dual primary. The two servers are connected directly via a cable so there is little possibility of the two being disconnected. We have only run into a problem with split-brain once where we were doing something rather silly - using vdi-copy to migrate virtual disks to a local software raid5 array while running VMs on the same raid5 array. In short the whole machine stopped responding to the network for minutes at a time while the copy took place. Fortunately we were able to recover pretty easily - lesson learned: stay away from software raid5 on the hypervisors.

This kind of setup has a big drawback : VMs with VDI on SR1 have to run on XCP1 and VMs with VDI on SR2 have to run on XCP2. There is a loss of flexibility and a loss of transparancy for XCP admins.

Generally, we have not had a problem with dual primary - we use live-migrate and run VMs on either node. When the hypervisors boot, we make sure everything is connected, switch both to primary and plug in the PBDs.

Our plan for recovering from split-brain it to pick the host with the most changed VDIs (the most data) as the new primary. Use DRBD with an external meta-disk to replicate the changed VDIs to the new primary. Invalidate the data on the junked host and run a full re-sync. Not fun, but recoverable as long as you know which VMs have been running on which host and you keep an eye out for split-brain (there are hooks you can use to notify you when things happen to drbd).


So I'd want to extend the lvm SR to integrate DRBD primary secondary, and I'd like to have some input on this kind of scenario :

* one each lvm, create a drbd resource, and when a vbd is brough up the drbd resource is switch to primary. * when migrating a vm to the second node, turn drbd on first node secondary, turn drbd on the second node primary, and get on with resuming the VM. * when a VMs is brought down, pbd is brought down and drbd resource is switch to secondary.

My understanding is the following: An LVM SR is effectively an LVM Volume group attached via a PBD. If you unplug the PBD when shutting down the VM, you are taking the whole SR offline. If you had one PBD per VM, you would need one SR per VDI therefore one VG per VDI which wouldn't work... You wouldn't be able to do snapshots, resize VDIs etc.......

What you might do (which is what I think you're angling at) is the following:

Both hosts handle their own LVM VG and corresponding LVs. Every time you create a VDI, each host would create the LV locally, set up a DRBD resource and start syncing. Both VDIs could sit in "Secondary" mode while the VM was down, and would only switch to primary while the VM was running on that particular host.

Provisions would have to be made for making sure the DRBD/LVM config was kept concurrent across the two hosts. Most of the DRBD config (sync rates, protcol, passwords, data integrity algorithm etc.....) could be stored in the SR config, but one would need to be able to re-build a whole SR should a disk on one of the hosts fail.

How would you deal with split-brain? If XCP2 appeared down, would XCP1 put a VDI into primary without knowing the state of the VDI on XCP2?

Please feel to correct me should my understanding be at fault in any way.

That would make a lot of drbd resource when accounting for snapshot and all, but if it could be possible to be done, it would be a tremendous addition for smaller setups for SMBs.

So far, we have only thought about a two node setup, but I suppose you could go further. You would need a system of keeping track of which VDI was mirrored on which which host, automatic re-distribution of VDIs to another host in the pool should a another fail (or be removed from the pool), each VM would be limited to two hosts in the pool (so you may run into problems with multiple host failures), each host would have to have a large amount of local storage. There are some limitations, but it might be workable......

My two cents!

I'd be glad to have some input from the dev if possible. By the way, kudos to the devs for XCP 1.6, it really rocks.

Good to hear - thanks to the devs from this corner too. I'm looking forward to playing with the new features soon.

Regards,

Tim


_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.