[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen and High Availability

From: Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx>
Date: Sat, 26 Feb 2011 09:56:24 -0500
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sat, 26 Feb 2011 06:58:29 -0800
List-id: Xen user discussion <xen-users.lists.xensource.com>

Randy Katz wrote:

Of those on the list that have implemented High Availability with Xenwhat
configurations are being used? And what degree of Fault Tolerance can be
expected? Ultimately would like to see fault tolerance and scalability at
the disk level and also at the VM (node) level where 3 or more nodes
can be utilized for automatic switchover. I have looked at some of thedocsand they all look like there is an active and an inactive node whichswitch ifthere is trouble (heartbeat), doesn't look like clusters of nodes areimplemented,
please let me know, thank you in advance,

I run a fairly simple 2-node setup. Configured roughly as follows, fromthe hardware up:

- lots of fault tolerance provided by the computer center (at theintersection of two power grids, plus generator, plus battery backup ineach rack; multiple backbone network connections, etc.)

- network: I'm only using one network drop, but more are available;right now only a simple 1G switch to break send that to two servers


- 2 1U rack-mounted servers: 4 drives in each, dual NICS (only using 1)
-- software RAID1 across all 4 drives for boot, swap, root for Dom0

-- software RAID10 (the md-provided varient) across all 4 drives for onelarge physical volume for LVM


- Xen/HA setup:

-- running the version of Xen 3 supported by Debian Lenny (will soonmigrate to either Sqeeze/Xen4 or OpenSUSE/Xen4)-- DRBD to mirror VMs across both nodes - for each VM: boot/root andswap volumes

-- pacemaker/corosync

-- haven't tried a 3rd node - DRBD only supports 2 nodes (I think it mayin later versions)-- 4 VMs - set for automatic failover - I load level by having 2 primaryon one node, 2 on the other


Experience:
-- DRBD insures that disks are consistent if a node fails

-- a node failure leads to the effected VMs booting on the other node -performance, of course, drops-- depending on how complicated a particular VM is (and how large thedrive) it can take up to about 5 minutes for a reboot-- when the failed node is brought back up - it can take a LONG time forDRBD and RAID10 and/or both to resync (note: configuring with a bitmapspeeds things up a lot)-- I expect there's a way to mirror working memory so that node failuredoesn't require a reboot, but never really dug into it

The most surprising thing is that, so far, all my failures have been aresult of Xen-induced kernel panics. The Lenny version of Xen 3 has anasty little bug in the code that allocates physical CPUs to virtualCPUs - every once in a while, when a CPU is released for re-allocation,there's a Dom0 (or perhaps hypervisor) kernel panic and reboot. For me,every once in a while translates to up to twice a day - alternating withdays of running smoothly. Supposedly that's been fixed upstream, but asSqueeze became eminent, nobody put any attention into updating the Lennypackage.

The only work-around for this bug is to pin CPUs. Since I've done that,I haven't had ANY failures of any sort - things just keep humming along(fingers crossed here).

One other thing to note: RAID, particularly software RAID has its ownnasty surprises: If a disk starts degrading, its internal failurerecovery mechanisms will often try to re-read sectors and such - so youget your data, but it takes longer and longer. md does NOT take note ofthis - so your machine will just get slower, and slower, and slower, andslower.... I learned this the hard way - in a way that led torebuilding my entire software environment rather than just swapping outone bad disk. Lessons learned from that:- use SMART tools to keep an eye on the Raw_Read_Error_Rate - anythingother than 0 indicates looming trouble- if all your disks are the same age, and they're RAIDed - they'relikely to fail around the same time - if one starts going, replace them all


Miles Fidelman

- computer center: redundant power (at the intersection of 2 grids,generator,-- note: I only have a single network drop, but could have multiple onesif I


--
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

Follow-Ups:
- Re: [Xen-users] Xen and High Availability
  - From: Randy Katz

References:
- [Xen-users] Xen and High Availability
  - From: Randy Katz

Prev by Date: PLEASE HELP ME AND MY MUM
Next by Date: Re: [Xen-users] how to reduce time of git pulling each time when you do a make world on Xen source
Previous by thread: [Xen-users] Xen and High Availability
Next by thread: Re: [Xen-users] Xen and High Availability
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.