[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Disk i/o on Dom0 suddenly too slow

First off, thanks for checking.
Secondly, I have managed to resolve the disk dumping issues from LVM
snapshots and preliminary tests are satisfactory.

Turns out, the default scheduler CFQ was not suited for this workload.
Dom0: echo deadline > /sys/block/sda/queue/scheduler
DomU: echo noop > /sys/block/xvda/queue/scheduler

If you need reasons, let me know and I'll explain the findings further.

Since I am using megaraid controller, I looked at LSI recommendations
and tweaked kernel further.
This overall gave me 50% performance boost on cheap Seagate disks.

No more sluggishness!!

About the script:

1) Good catch. That was indeed the purpose of creating $ddpid. Seems
like a typo.

2) We use RHEL/CentOS in production so I have never had such an issue
so didn't consider. But you could do something like:
[[ $(ps -p $(pidof lvdisplay) -o etimes:1=) -gt 300 ]] do something if
it executes for more than 5 mins

3) My tests at time showed 512k snapshot chunk size gave more speed to
dd writes. But now after I have switched to deadline scheduler, there
are best results without specifying -c parameter to lvm and dd'ing
with bs=100M. Also, there's no need for ionice since it's works with
CFQ only.

4) It takes the same amount of CPU time though. Dumping and
compressing large chunks at the same time with pipes and stdouts can
cause weird issues with FIFOs. IMHO, why risk taking a chance of
having corrupt backups when the only real way in the world to test the
backups is by restoring them! A little certainty of knowing of not
having a dirty backup is worth little more of I/O expense!

5) Affirmative. That is why two separate config variables exist there:

> My script is currently much simpler, I simply create the snapshots and
> remove the old ones (no full copies of the snapshots/etc).

Seems fine. In my case there are more than few nodes and tens of
domains. So the above works pretty well for me as short term backup

> I use backuppc which I've got working for one system to snapshot the VM,
> mount the image, backup with rsync, then umount and remove the snapshot. I
> still like to keep a full image snapshot, and even better to send that raw
> image offsite.

I use Burp from inside the domu.

> It would be interesting to hear if you have any additional 
> information/comments?

Well, I started with few small machines and one after another SSDs
died on me either due to a firmware problems or bad blocks. I tried
Crucial, switched to Intel and then Samsung. The latter were ones that
ran fine for the longest time. Now I just use these for personal

> Another scenario I shutdown the VM (using an image file), then simply copy
> the file via some tools into chunks of 100M, then startup the VM.

Seems fine from administration point of view but people have become
uptime conscious these days.

> In my opinion, gluster will add a lot of overhead anyway, and maybe is not
> sufficiently stable, and certainly I don't know it well enough to put into
> production. While LVM + MD + DRBD are all simple, low overhead, well
> understood, etc... Each read/write with LVM/MD/DRBD is simply a remap
> process to a physical device read/write, while glusterfs seems more of a
> filesystem with more overhead/complexity.

And I haven't played much with DRBD so there are only guesses. My
understanding with network based domains' I/O is that unless you have
high speed disks or network equipment or preferably a SAN, the domains
will suffer from I/O latency if there are more than a few. Simply the
gigabit switches and so called 6Gb/s SAS drives aren't sufficient.

> Running multiple VM's on a single storage device, especially spinning disks,
> seems to be challenging to ensure the right performance with all the
> contention/etc... Using SSD's should be a lot simpler/easier, but LVM
> performance is making that really difficult, and I still don't understand
> why performance is so horrible. At some point, I'll join the LVM list and
> investigate in more detail, but I've got "good enough" performance so far,
> and have other higher priority issues on my list...

So true. Try the workaround I mentioned above of switching the
scheduler to noop or deadline, and see if you find any improvements.

> Thanks again.
Quite welcome!

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.