[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re:[Xen-users] Which distributed file system for xen hosts/guests?



--- John Fairbairn <webmaster@xxxxxxxxxxxx> wrote:
> Ive been using eNBD with raid1 over tcp/ip and COW. I think this would be
> a viable solution for you since its fast, has raid, is COW, automatically
> embeds a code into the client kernels for antispoofing with no extra work,
> and is a block device(so clients dont see where it comes from). The High
> availability 'virtual' web cluster I've built is IP_VS(ipvsadm),
> heartbeat, and eNBD. It's been working well thus far. I guess the nicest
> thing about the system I set up is that I used things that are already
> available in the 2.6.x kernels with no patching. You may have to recompile
> you dom0 and domU for some of the features(enabling NBD, IP_VS, and COW)
> but thats still far easier than trying to mix xen patches and other
> patches IMO. Here's a link to eNBD if you feel like checking it out
> http://www.it.uc3m.es/~ptb/nbd/ and heres IP_VS(ipvsadm)
> http://www.linuxvirtualserver.org/software/ipvs.html and heartbeat
> http://www.linux-ha.org/HeartbeatProgram
> 
> Hope this Helps ya some.
> 
> John Fairbairn
> 
> 
> > Maybe this is the wrong place to ask, but since it is related to the
> > overall multi-machine architecture, here goes:
> >
> > What would you folks look at as far as a distributed file systems for
> > various physical/virtual Xen machines? Requirements are:
> >
> > 1. fault tolerant
> > 2. relatively speedy
> > 3. actively supported/used
> > 4. production stability
> > 5. ability to add/remove/resize storage online
> > 6. clients are unaware of physical file location(s)
> > 7. Local client caching for speed over slower network links
> > 8. I suppose the file system would sit on something like raid 5 for
> > physical protection.
> > 9. At a high-level, I'd like to be able to dedicate individual "private"
> > file systems to machines as well as have various "public" filesystems, all
> > with the same name space.
> > 10. Oh yeah -- secure
> >
> > I've looked at:
> >
> > 1. GFS -- I want something a bit more
> > 2. AFS -- looks great, but it appears to support files up to 2GB? Not big
> > enough. Active community though and was/is a commercial product.
> > 3. NFS -- please
> > 4. Lustre -- looks promising but complex and not completely open source.
> > 5. OCFS2 -- Oracle site says beta code, not production ready. Maybe soon?
> > 6. Intermezzo -- doesn't look like an active project any more
> > 7. Coda -- same


I've been watching this thread with interest because I too am in the planning
stages of a similar setup.  We're going to be creating a cluster, one node here
in Jacksonville and one somewhere else in the country for the ultimate in high
availability.

What John Fairburn is suggesting looks good.  I like not having to patch the
kernel any more than necessary.

But I'm afraid it doesn't address one of the original poster's concerns (forgot
his name), number 7: Local client caching for speed over slower network links
(which is the concern I'm most interested in).

The problem with eNBD and RAID 1 is -- as I understand it -- only one server
can read/write at any given time.  Ideally I would like something like GFS with
global locks so that I could have a cluster with Xen host node A here in
Jacksonville and Xen host node B in Los Angeles and both be able to write to
/dev/nda1 _at_the_same_time_.  This would give better usage of resources and
yet still offer a remote hot site.  As I understand it, eNBD doesn't offer
this.


There are a couple of possible solutions to this.

Someone might suggest NFS mounting the read/write partition so the other system
could access it.  That works in theory but it eats up WAN bandwidth; traffic
goes into box A which has NFS mounted box B's eNBD partition.  Traffic goes
back out across the WAN to box B and then is read back across the WAN to box A
and then the result is spit back to the client.  At least 2x more traffic; not
viable.  Doesn't address concern number 7 at all.


You could use two eNBD partitions, one read/write at each site, and Heartbeat
can bring up both read/write on one machine if the other fails.  Heartbeat
would then restart any services from the failed machine and continue where it
left off.

I did this with DRBD and Heartbeat.  DRBD+Heartbeat is almost identical to eNBD
solution above with the advantage that Heartbeat comes with DRBD scripts --
that made setup a breeze!  Also, using LVS introduces a single point of failure
unless you run it in high-availability mode (see
http://www.ultramonkey.org/3/topologies/).


Here's how I did it:
http://devidal.tv/~chris/DRBD+Heartbeat_overview.html

I didn't run Xen on it; I'd consider adding COW if I did.


A disadvantage is only one site can serve a particular set of content at any
given time.  This is somewhat a waste of resources.  You can mitigate this a
little by serving different content on different partitions; for instance I put
web on one partition and mail on another.  Then if one system fails the other
would handle the load of both for a while.

If you do this you introduce another disadvantage: you use 4x as much storage
(two pairs of RAID 1 partitions).

Something that seems to be a disadvantage about this is it only scales to two
systems (RAID 1) but you can set up as many partition pairs as you need.  I set
up two pairs but could have dozens on one node or have dozens of nodes.  DRBD
has a nice "group" configuration command (see if eNBD offers this); grouping
ensures that if two partitions are on one spindle they don't try to syncronize
together (a performance nightmare).


I like what GFS offers; multiple clients can read/write to one partition.  But
the original poster said, "I want something a bit more than what GFS offers." 
Are you saying this because you like me think that GFS only works with
locally-attached shared SCSI storage?  Apparently I was wrong:
http://gfs.wikidev.net/GNBD_installation

Apparently it works with GNBD which would, as I understand it, let two or more
nodes in remote locations simultaneously write to the same partition.  I'd need
it to offer redundancy such as RAID (if not it's a no-go).

Need to test this out.  If it works well it seems to be the best solution.


The original poster's number 5 concern: "ability to add/remove/resize storage
online."  One of my concerns, too.  I was thinking of overlaying whatever
network storage I use with LVS.  I haven't tried it, but I've been told you can
resize partitions on the fly, move data around at will, etc.  I'd be hesitant
to use it to add another set of RAID drives to an existing RAID drive (like
JBOD) because you double your risk of data loss if one set dies (like JBOD).


As for security, I'm going to link our sites with a VPN.  You could also
install a VPN directly on each node if you want traffic to be encrypted before
it even hits your LAN.  Per my previous question I'll probably run a VPN daemon
in bridging mode on domain0.


Finally, I'm watching this project very carefully:
http://sourceware.org/cluster/ddraid/

Looks promising, too.


Anyone else have input?

CD

Ever lied?  You're a liar.  Ever stolen?  You're a thief.  Ever hated? The 
bible equates hate with murder.  Ever lusted?  Jesus equated lust with 
adultery.  You've broken God's law.

He'll judge all evil and you're without hope -- unless you have a savior. 
Repent and believe.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.