[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Disk i/o on Dom0 suddenly too slow

On 09/07/13 18:49, Micky wrote:
I've found two "solutions":
1) Make your storage backend perform like a god so that after you take the
snapshots performance is like a stroll down the road. (ie, I've upgraded to
SSD based storage which can get approx 1.5TB/s write and 2.5TB/s read) ....
2) Only keep a single snapshot, and if possible, remove it as soon as your
backup is completed.... and/or keep writes to a minimum while the snapshot
is active.
That's what the script I wrote, is doing. Check

I had a quick read through of your script... looks pretty nice and complete, just a couple of comments: 1) line 159 you do a killall -9 dd, but you know the pid of dd that you launched, you might accidentally kill another dd process run from another script/etc... so consider to change to killall -9 $ddpid

2) in find_lvm you call lvdisplay, and this is where I tend to have the same problem (various lvm2 processes hang forever, including lvs, and lvremove when removing snapshots). I don't know a good way to solve that except reboot when it happens.

3) You set the snapshot chunk size to 512k, what does this do, does it really make much difference?

4) You are reading the full snapshot, writing out the full uncompressed copy of the image, then reading the copy back and writing the compressed copy out. You could optimize this by reading the snapshot, and writing compressed data directly in one step. If the CPU is faster than the disk, this will reduce the overall backup time, and might also reduce the time the snapshot hangs around.

5) I found if the LV is on the same disk as I am saving the dump to, then this drastically slows things down (reading/writing the same disk in different locations at the same time). Either backup to different disks if possible.

My script is currently much simpler, I simply create the snapshots and remove the old ones (no full copies of the snapshots/etc).

I use backuppc which I've got working for one system to snapshot the VM, mount the image, backup with rsync, then umount and remove the snapshot. I still like to keep a full image snapshot, and even better to send that raw image offsite.

Another scenario I shutdown the VM (using an image file), then simply copy the file via some tools into chunks of 100M, then startup the VM.

As for SSDs, I didn't find them stable as in long-term production environments!

Interesting, I've had problems with a number of SSD's, but since I started using the Intel 520s, I've not had any issues. I have one environment with about 10 heavily used windows domU's, the SAN is using 5 x 480G SSD's, and so far haven't had any issues (I think over 12 months now). It would be interesting to hear if you have any additional information/comments?

My plan is to do something like this:
1) Have two storage backend machines
2) Use DRBD to sync the two of them (primary sits on RAID device, secondary
sits on LVM on RAID device)
3) Use LVM on top of the DRBD to create LV's for each domU
5) Take a snapshot using the underlying LVM (below DRBD) on the secondary
6) Run your backup processes on the snapshot of the DRBD
7) Delete the snapshot
Sounds a lot complicated. Block level snapshots under grouped block
level devices -- seems like a lot of overhead!
Gluster may be a lot more useful in this case -- just a slight guess.

In my opinion, gluster will add a lot of overhead anyway, and maybe is not sufficiently stable, and certainly I don't know it well enough to put into production. While LVM + MD + DRBD are all simple, low overhead, well understood, etc... Each read/write with LVM/MD/DRBD is simply a remap process to a physical device read/write, while glusterfs seems more of a filesystem with more overhead/complexity.

I haven't yet got that far in the process, so if you do something it would
be helpful to hear about it.

Also any other people who can share what they do and what works well/doesn't
work would be nice to see.
I am experimenting with a few tricks. I will share the outcome like
the script I just shared :)

Thanks, appreciated.

Finally, the other problem I have with LVM on Debian (stable) is that every
week or two, it will freeze on lvremove, and other lvs or LV related
commands will freeze. The only solution seems to be a reboot. (Using kernel
3.2.0-4-686-pae #1 SMP Debian 3.2.41-2 i686). I haven't tracked this down or
reported it yet, but it is frustrating to have to reboot the dom0 so often.
LVM is slow as heck when it comes to snapshots. And everywhere I look,
people talk about the "copy on write" magic,
but no one tells you that you are gonna bite your tongue!
If biting my tongue would help, I'd do it :)

Running multiple VM's on a single storage device, especially spinning disks, seems to be challenging to ensure the right performance with all the contention/etc... Using SSD's should be a lot simpler/easier, but LVM performance is making that really difficult, and I still don't understand why performance is so horrible. At some point, I'll join the LVM list and investigate in more detail, but I've got "good enough" performance so far, and have other higher priority issues on my list...

Thanks again.


Adam Goryachev Website Managers www.websitemanagers.com.au

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.