[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] HA Xen on 2 servers!! No NFS, special hardware, DRBD or iSCSI...
I've been brainstorming... I want to create a 2-node HA active/active cluster (In other words I want to run a handful of DomUs on one node and a handful on another). In the event of a failure I want all DomUs to fail over to the other node and start working immediately. I want absolutely no single-points-of-failure. I want to do it with free software and no special hardware. I want to do it with just 2 servers. I want it to be pretty simple to set up. And I would like to take advantage Xen's live migration feature for maintenance. Not asking much, am I? ;-) This topic was discussed earlier this year (read the "Xen and iSCSI" thread on the list archives) and the best solution I saw was to create a poor-man's SAN using DRBD, Heartbeat, GFS and iSCSI on two nodes and then run Xen on two MORE nodes. Four nodes -- and quite a bit of complication, and I wonder what the performance was like? I think I know how to do it with 2 nodes and (I think) it'll be less complicated and perform better (I think). I haven't tried it, neither do I have any experience with AoE, LVM, etc. Only basic experience with Xen. But I think it should work. Check it out: * Get 2 computers with 2 NICs * Install your favorite Linux distro on each * Partition the drive into 4 partitions. Two are Dom0 OS and swap, two unformatted. * Connect the intra-node NICs with a crossover/switch/hub (Didja know that gigabit NICs are auto-MDIX? No crossover needed!) * Configure the intra-node IPs to something like 10.0.0.1 and 10.0.0.2 or 192.168... * Install Xen * Install ATA over Ethernet and VBlade in Dom0 * Node1: vblade 0 1 /dev/hda3 eth1 # This is one of the unformatted partitions * Node2: vblade 0 2 /dev/hda4 eth1 # The other # Or use vbladed but I don't know how yet * modprobe aoe on each node * Install LVM on both nodes (in Dom0) * Create two volume groups on each node: Node1: vgcreate DomU1hda /dev/hda3 Node1: vgcreate DomU1hdb /dev/etherd/e0.2 # The AoE-exported device from the other node Node2: vgcreate DomU2hda /dev/hda4 Node2: vgcreate DomU1hdb /dev/etherd/e0.1 * Create logical volumes: Node1: lvcreate -n hda DomU1hda # Or do you create partitions here? lvcreate -n hda1? Node1: lvcreate -n hdb DomU1hdb Node2: lvcreate -n hda DomU1hda Node2: lvcreate -n hdb DomU1hdb * fdisk /dev/DomU1hda/hda # Create two partitions for DomU1, OS and swap * mkfs.ext3 /dev/DomU1hda/hda1 # Repeat for DomU1hdb and DomU2hdX * mkswap /dev/DomU1hda/hda2 # Repeat for DomU1hdb and DomU2hdX * Create a Xen DomU on each node with this configuration: Node1 DomU1: disk = [ 'phy:DomU1hda/hda1,hda1,w' ] disk = [ 'phy:DomU1hdb/hdb1,hda1,w' ] disk = [ 'phy:DomU1hda/hda2,hda2,w' ] disk = [ 'phy:DomU1hdb/hdb2,hda2,w' ] Node2 DomU2: disk = [ 'phy:DomU2hda/hda1,hda1,w' ] disk = [ 'phy:DomU2hdb/hdb1,hda1,w' ] disk = [ 'phy:DomU2hda/hda2,hda2,w' ] disk = [ 'phy:DomU2hdb/hdb2,hda2,w' ] * Install the DomU OSes * (Important part) Mirror the OSes using software RAID * Install Heartbeat on both nodes in Dom0, ensure the Xen script uses live migration when failing over gracefully * Run DomU1 on Node1, DomU2 on Node2 Result: [ DomU1 ] [ DomU2 ] / \ / \ [ hda ] [ hdb ] [ hda ] [ hdb ] \ / \ / [ LVM ] [ LVM ] / \ | [ Real HD ] [ AoE HD ]<--[ Real HD ] | ___| / | [ Real HD ]-->[ AoE HD ] [ Real HD ] [ Node1 ] [ Node2 ] After a failure or during maintenance: [ DomU1 ] [ DomU2 ] / / [ hda ] [ hda ] \ \ [ LVM ] [ LVM ] / / [ Real HD ] / / / [ Real HD ] [ Node1 ] (ASCII art shore is purdy, Sam...) LVM is not just a nice thing in this case, it is a necessity! In addition to being able to resize the DomU's partitions on the fly, it adds a layer of obfuscation so that Xen is presented with the same device name on both nodes. I understand this is critical during a live migration and appears to be the reason for going with iSCSI or NFS. The key is to use software mirroring within the DomU OSes. I thought about using DRBD alone but that doesn't allow live migration. Works great if you do a regular suspend-to-disk migration but not live migration, because when you change a DRBD device from secondary to primary you must umount it (so you'd have to stop the DomU). Mirroring also allows the DomU OS to restart if the host node it's on crashes because the data should be consistent. It also allows a DomU to keep operating -- no downtime -- if the other node (the node it's not running on) crashes. Finally, AoE is painless to set up. But I don't see why AoE could not be replaced with iSCSI if it's not working right. I have three question marks in my head: 1.) How will it perform? 2.) Does it work? Someone want to beat me to the punch and try it themselves? It's likely to be a little while before I can find the time to try it. 3.) Is it reliable? Should be; AoE is relatively new but very simple. LVM is well-tested and software mirroring is as old as the hills. Thoughts? CD TenThousandDollarOffer.com _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |