Xen project Mailing List

Re: [Xen-users] Xen and High Availability

To: Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx>

From: Randy Katz <rkatz@xxxxxxxxxxxxxxxxxxxxx>

Date: Sat, 26 Feb 2011 08:16:02 -0800

Delivery-date: Sat, 26 Feb 2011 08:17:39 -0800

List-id: Xen user discussion <xen-users.lists.xensource.com>

On 2/26/2011 6:56 AM, Miles Fidelman wrote:

Randy Katz wrote:

Of those on the list that have implemented High Availability with Xenwhat
configurations are being used? And what degree of Fault Tolerance can be
expected? Ultimately would like to see fault tolerance andscalability at
the disk level and also at the VM (node) level where 3 or more nodes
can be utilized for automatic switchover. I have looked at some ofthe docsand they all look like there is an active and an inactive node whichswitch ifthere is trouble (heartbeat), doesn't look like clusters of nodes areimplemented,
please let me know, thank you in advance,

I run a fairly simple 2-node setup. Configured roughly as follows,from the hardware up:

- lots of fault tolerance provided by the computer center (at theintersection of two power grids, plus generator, plus battery backupin each rack; multiple backbone network connections, etc.)

- network: I'm only using one network drop, but more are available;right now only a simple 1G switch to break send that to two servers


- 2 1U rack-mounted servers: 4 drives in each, dual NICS (only using 1)
-- software RAID1 across all 4 drives for boot, swap, root for Dom0

-- software RAID10 (the md-provided varient) across all 4 drives forone large physical volume for LVM


- Xen/HA setup:

-- running the version of Xen 3 supported by Debian Lenny (will soonmigrate to either Sqeeze/Xen4 or OpenSUSE/Xen4)-- DRBD to mirror VMs across both nodes - for each VM: boot/root andswap volumes

-- pacemaker/corosync

-- haven't tried a 3rd node - DRBD only supports 2 nodes (I think itmay in later versions)-- 4 VMs - set for automatic failover - I load level by having 2primary on one node, 2 on the other


Experience:
-- DRBD insures that disks are consistent if a node fails

-- a node failure leads to the effected VMs booting on the other node- performance, of course, drops-- depending on how complicated a particular VM is (and how large thedrive) it can take up to about 5 minutes for a reboot-- when the failed node is brought back up - it can take a LONG timefor DRBD and RAID10 and/or both to resync (note: configuring with abitmap speeds things up a lot)-- I expect there's a way to mirror working memory so that nodefailure doesn't require a reboot, but never really dug into it

The most surprising thing is that, so far, all my failures have been aresult of Xen-induced kernel panics. The Lenny version of Xen 3 has anasty little bug in the code that allocates physical CPUs to virtualCPUs - every once in a while, when a CPU is released forre-allocation, there's a Dom0 (or perhaps hypervisor) kernel panic andreboot. For me, every once in a while translates to up to twice a day- alternating with days of running smoothly. Supposedly that's beenfixed upstream, but as Squeeze became eminent, nobody put anyattention into updating the Lenny package.

The only work-around for this bug is to pin CPUs. Since I've donethat, I haven't had ANY failures of any sort - things just keephumming along (fingers crossed here).

One other thing to note: RAID, particularly software RAID has its ownnasty surprises: If a disk starts degrading, its internal failurerecovery mechanisms will often try to re-read sectors and such - soyou get your data, but it takes longer and longer. md does NOT takenote of this - so your machine will just get slower, and slower, andslower, and slower.... I learned this the hard way - in a way thatled to rebuilding my entire software environment rather than justswapping out one bad disk. Lessons learned from that:- use SMART tools to keep an eye on the Raw_Read_Error_Rate - anythingother than 0 indicates looming trouble- if all your disks are the same age, and they're RAIDed - they'relikely to fail around the same time - if one starts going, replacethem all

Miles, excellent, thanks. What version of DRBD are you currently on?

You say you're only using 1 NIC on each machine, so you did NOT use a

cross-cable on the second

NIC between the machines as is recommended? Why are you using RAID10

with LVM? Couldn't you just

have one large VG that included all the drives? Have you had any drives

fail within the RAID10 yet?

Thanks, Randy _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.