[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Disk i/o on Dom0 suddenly too slow

To: Micky <mickylmartin@xxxxxxxxx>
From: Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 10 Jul 2013 12:33:30 +1000
Cc: "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>
Delivery-date: Wed, 10 Jul 2013 02:34:47 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

On 09/07/13 18:49, Micky wrote:

I've found two "solutions":
1) Make your storage backend perform like a god so that after you take the
snapshots performance is like a stroll down the road. (ie, I've upgraded to
SSD based storage which can get approx 1.5TB/s write and 2.5TB/s read) ....
2) Only keep a single snapshot, and if possible, remove it as soon as your
backup is completed.... and/or keep writes to a minimum while the snapshot
is active.

That's what the script I wrote, is doing. Check
http://github.com/bassu/xen-scripts/

I had a quick read through of your script... looks pretty nice andcomplete, just a couple of comments:1) line 159 you do a killall -9 dd, but you know the pid of dd that youlaunched, you might accidentally kill another dd process run fromanother script/etc... so consider to change to killall -9 $ddpid

2) in find_lvm you call lvdisplay, and this is where I tend to have thesame problem (various lvm2 processes hang forever, including lvs, andlvremove when removing snapshots). I don't know a good way to solve thatexcept reboot when it happens.

3) You set the snapshot chunk size to 512k, what does this do, does itreally make much difference?

4) You are reading the full snapshot, writing out the full uncompressedcopy of the image, then reading the copy back and writing the compressedcopy out. You could optimize this by reading the snapshot, and writingcompressed data directly in one step. If the CPU is faster than thedisk, this will reduce the overall backup time, and might also reducethe time the snapshot hangs around.

5) I found if the LV is on the same disk as I am saving the dump to,then this drastically slows things down (reading/writing the same diskin different locations at the same time). Either backup to differentdisks if possible.

My script is currently much simpler, I simply create the snapshots andremove the old ones (no full copies of the snapshots/etc).

I use backuppc which I've got working for one system to snapshot the VM,mount the image, backup with rsync, then umount and remove the snapshot.I still like to keep a full image snapshot, and even better to send thatraw image offsite.

Another scenario I shutdown the VM (using an image file), then simplycopy the file via some tools into chunks of 100M, then startup the VM.

As for SSDs, I didn't find them stable as in long-term production environments!

Interesting, I've had problems with a number of SSD's, but since Istarted using the Intel 520s, I've not had any issues. I have oneenvironment with about 10 heavily used windows domU's, the SAN is using5 x 480G SSD's, and so far haven't had any issues (I think over 12months now). It would be interesting to hear if you have any additionalinformation/comments?

My plan is to do something like this:
1) Have two storage backend machines
2) Use DRBD to sync the two of them (primary sits on RAID device, secondary
sits on LVM on RAID device)
3) Use LVM on top of the DRBD to create LV's for each domU
5) Take a snapshot using the underlying LVM (below DRBD) on the secondary
6) Run your backup processes on the snapshot of the DRBD
7) Delete the snapshot

Sounds a lot complicated. Block level snapshots under grouped block
level devices -- seems like a lot of overhead!
Gluster may be a lot more useful in this case -- just a slight guess.

In my opinion, gluster will add a lot of overhead anyway, and maybe isnot sufficiently stable, and certainly I don't know it well enough toput into production. While LVM + MD + DRBD are all simple, low overhead,well understood, etc... Each read/write with LVM/MD/DRBD is simply aremap process to a physical device read/write, while glusterfs seemsmore of a filesystem with more overhead/complexity.

I haven't yet got that far in the process, so if you do something it would
be helpful to hear about it.

Also any other people who can share what they do and what works well/doesn't
work would be nice to see.

I am experimenting with a few tricks. I will share the outcome like
the script I just shared :)


Thanks, appreciated.

Finally, the other problem I have with LVM on Debian (stable) is that every
week or two, it will freeze on lvremove, and other lvs or LV related
commands will freeze. The only solution seems to be a reboot. (Using kernel
3.2.0-4-686-pae #1 SMP Debian 3.2.41-2 i686). I haven't tracked this down or
reported it yet, but it is frustrating to have to reboot the dom0 so often.

LVM is slow as heck when it comes to snapshots. And everywhere I look,
people talk about the "copy on write" magic,
but no one tells you that you are gonna bite your tongue!

If biting my tongue would help, I'd do it :)

Running multiple VM's on a single storage device, especially spinningdisks, seems to be challenging to ensure the right performance with allthe contention/etc... Using SSD's should be a lot simpler/easier, butLVM performance is making that really difficult, and I still don'tunderstand why performance is so horrible. At some point, I'll join theLVM list and investigate in more detail, but I've got "good enough"performance so far, and have other higher priority issues on my list...


Thanks again.

Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] Disk i/o on Dom0 suddenly too slow
  - From: Micky

References:
- [Xen-users] Disk i/o on Dom0 suddenly too slow
  - From: Micky
- Re: [Xen-users] Disk i/o on Dom0 suddenly too slow
  - From: Micky
- Re: [Xen-users] Disk i/o on Dom0 suddenly too slow
  - From: Adam Goryachev
- Re: [Xen-users] Disk i/o on Dom0 suddenly too slow
  - From: Micky

Prev by Date: Re: [Xen-users] Questions from a newbie (how to reboot/ windows vista crashing machine)
Next by Date: Re: [Xen-users] Debian whezzy & Xen 4.1 & IPv6
Previous by thread: Re: [Xen-users] Disk i/o on Dom0 suddenly too slow
Next by thread: Re: [Xen-users] Disk i/o on Dom0 suddenly too slow
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.