[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] status of remus

  • To: Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx>
  • From: Henrik Andersson <henrik.j.andersson@xxxxxxxxx>
  • Date: Thu, 14 Apr 2011 01:30:34 +0300
  • Bcc: xen-users@xxxxxxxxxxxxxxxxxxx
  • Cc:
  • Delivery-date: Wed, 13 Apr 2011 19:29:42 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=bcc:mime-version:in-reply-to:references:date:message-id:subject :from:to:content-type; b=lWOKM6qcluMny1R6ToTMUEgtmTRK5zCsoOhEYGr5m50+0s5cg1MaRRR5c/tKo9Y39l WEhDyeNqxKSzLcE8aC/0rYcrMC4dbBTorZUukKEDah94HNt2avTqYymSz9nAQ+fwFJpq 8QSsxyDhST3ap5bls+PQGObagdqyqgRNTwA40=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

My quess would be that, Pasi is talking about the over all performance/efficiency of the solution, not the fail over time. Even tho Remus might shorten the fail over time significantly, it might be less efficient in utilizing your resources, because of the way it works. What I mean is that with imaginary software/service HA solution you might be able to get 100 request/sec or what ever metric you might want to use and Remus could perform less, like 80 request/sec or so. 

I have no idea how fast or efficient solution Remus is, so don't take those numbers as any sort of indication of the predicted performance of any solution, they were there just to clarify what I am trying to say. 

Between H(igh)A(vaileability) and F(ault)T(olerancy) and all the other Two letter combinations, it seem's to be quite hard to draw clear lines between them. My take on this matter would be that where HA certainly has something to do with FT, they are not the same. Where HA might be more generic term used for securing the service, by making it availeable even in disaster conditions, FT might be part of that agenda. 

Having dual power supply's on a server increases the fault tolerancy of that server but does it make the services running in it more highly availeable? I quess in a way it might but the primary goal of adding PSU was to increase the fault tolerancy, not to make some service more or less availeable. One could say that he is trying to make the service Highly Availeable by increasing it's fault tolerancy? Not sure if this makes any sence but I couldn't find any gospel like definition for the terms, so I'm making this up while I'm writing.

In the case of remus you are "runing two instances of the same VM" and because you have kind of back up VM ready to take over for the primary, it increases the fault tolerancy of that VM. But because the VM's are (if I understand correctly) more or less the same, bug on a software (for example in apache or what ever service you are trying to make more highly availeable), could possibly make both of the VM's go down, since they are the same and the bug could affect them equally. 

Sorry for the long post and putting my spoon to a soup that is not mine. Hope some of the things I said made sence and if possible help you in some way.

-Henrik Andersson

On 11 April 2011 01:42, Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx> wrote:
Pasi Kärkkäinen wrote:
There's a new and active maintainer for Remus. He's been posting
many patches recently to xen-devel mailinglist.

Good to hear!

Just remember Remus is FT (Fault Tolerance) solution..
often it's better use software/service based HA instead of vm-based FT.
FT is way slower than "normal" HA.

Can you elaborate a bit, both re. what you see as the difference between fault tolerance vs. HA, and re. speed?

Currently, I'm running a collection of services on a single VM, with DRBD/heartbeat/pacemaker failover to a 2nd node.  Failover takes a LONG time.  Looks to me like Remus should provide instant failover.


Miles Fidelman

In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Xen-users mailing list

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.