[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Solving the DRBD resync issue

To: xen-users@xxxxxxxxxxxxxxxxxxx
From: Daniel Meszaros <meszi@xxxxxxxxxxx>
Date: Fri, 03 Jun 2011 21:06:06 +0200
Delivery-date: Fri, 03 Jun 2011 12:07:53 -0700
List-id: Xen user discussion <xen-users.lists.xensource.com>

Hi!

I am sorry for not answering. My SPAM filter wasn't your friend and Imissed to check it in time... :-(


Am 05/23/2011 10:28 AM, schrieb Fajar A. Nugraha:

On Mon, May 23, 2011 at 2:46 PM, Daniel Brockmann wrote:


the more I think about my DRBD issue and the more I research in the net the
more I tend to explain the issue with limited CPU time for dom0.


First thing first.
By "sync problems" in your previous post, did you mean both nodes
experience split brain for the drbd resource?

One machine -luckily that one that didn't have any important machinesrunning- lost sync so the other machine was the "newer" DRBD partner.But when starting the sync again all virtual guests of the "newer" DRBDpartner weren't available anymore until I stopped the sync forcefully.Besides this the sync speed was far below that what I had when I set upthe DRBD pair on both Xenserver machines.

When setup properly, you should NOT experience it, regardless of how
much CPU resource dom0 has. You should only experience SLOW disk I/O.
split brain usually occur if you don't setup fencing properly.

The funny thing: It worked fine in the beginning. I was playing aroundwith Xenmotion with a productive mail server machine without anyproblems. It continuously synced with rates between 250 and 350 MByte/swhich is more than okay for our needs.

It will be
better resolving _this_ instead of possibly reaching the same stage later on
again but using another replication technique, wouldn't it?

Reasons why I think it is an I/O and/or CPU time issue:

1. It worked properly when I still did not have 8 virtual guest systems
installed.
2. As soon as I start a DRBD resync my virtual guests bring kernel error
messages like "INFO: task exim4:2336 blocked for more than 120 seconds. ".
3. When starting both Xenserver machines and syncing before starting the
virtual guests a startup that's usually done in<5 minutes takes up to 60
minutes.


... which is exactly the SLOW I/O I mentioned above.

What I cannot understand is why this was occuring weeks after the setupwas tested successfully. :-/

Okay, one thing changed: A few machines were added to the virtual pool.But even when these machines have been halted I could not sync the DRBDpair anymore.

I checked the XenWiki accordingly and found two promising entries that I'd
like to follow, if it's possible to apply them under a Citrix Xenserver 5.6
system:

http://wiki.xensource.com/xenwiki/XenCommonProblems#head-413e1d74442772fd5a0a94f0655a009744096627

1. How can I limit the number of vcpus my dom0 has?
2. Can I dedicate a cpu core (or cores) only for dom0?

Especially the 2nd one appears to meet what I expect. So I would be going to
check if I can configure that. How do _you_ think about it?


This thread might be able to help you:
http://xen.1045712.n5.nabble.com/XenServer-adding-additional-CPUs-to-the-control-domain-td3325379.html

I tried it differently then by installing the DRBD "loser" withXenserver 5.6 SP2 that reserves 4 VCPUs to dom0. I additionally cut 4 GBRAM away for dom0. And I compiled DRBD from the latest sources this time(on the old systems I took precompiled binaries). But this time Icouldn't even create a Xenserver storage on the DRBD device.

Personally, I suggest you step back and evaluate several things:
- do you REALLY need active-active setup?
Active-active drbd mandates protocol C (sync replication), which can
GREATLY slow down your throughput. If you can afford a small downtime
better stick with async replication.

I could afford it in the worst case. And I will keep your suggestion inmind for the case my alternative tests appears to fail again.

Actually I run Debian Squeeze on the former DRBD "loser" with astill-not-syncing DRBD device. However, now I try KVM ... and it appearsto run at least as fine as my Xenserver setup before ... let's seewhat's gonna happen in the moment when I kill the Xenserver machine inorder to be installed with Squeeze, too ... and start syncing both.


(In this moment I wished to have more than just two hosts.)

- do you KNOW how much IOPS you need?
[...]

Hmm, It's not gonna be a lot more than the machines I have. Mostperformance-consuming would be a virtualized Samba machine, I guess.I'll check that next week when I am at the machines again. Thanks forthe suggestion.

All things considered, it might be that your best option would be
something like:
- get a separate server with lots of disks, setup raid10, install a
storage appliance OS on top (e.g.
http://www.napp-it.org/index_en.html) then export it to your xenserver
either as nfs or scsi. While nfs/scsci induce some overhead, it should
be lower compared to using drbd, OR

Hmm, my experience with NFS is that it's a lot slower than DRBD -however, ignoring the syncer rates I had lately. ;-)

iSCSI would be an option if I had the budget to have that separatestorage server. But I unfortunately don't have it. :-(

- drop active-active requirement, OR

Dual-primary worked for a while. I want to check at first if I can getthat running again the way it was (but using a different platform). Butif it was the only way I could live with a "usual" DRBD setup, too. :-)

- beef-up your xenserver (e.g. use fast storage like SSD), upgrade
XenServer/XCP version to get dom0 to use multiple CPU core on dom0,
upgrade DRBD version to the latest, and setup proper fencing.

I am not that convenient anymore if Xenserver and my MagnyCours CPUswork well together at all. That's why I tried Squeeze and KVM now ...shipping around the limitations that Citrix included in their product.

Thank you very much for your suggestions. Regardless if it's KVM or Xen... it will help me a lot.


CU,
Mészi.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

Prev by Date: Re: [Xen-users] Alternatives to a dual-primary DRBD setup
Next by Date: [xen-users] XCP - Base Iso seem to contain broken packages how to proceed
Previous by thread: Re: [Xen-users] Alternatives to a dual-primary DRBD setup
Next by thread: [xen-users] XCP - Base Iso seem to contain broken packages how to proceed
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.