[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched



On 2016-10-19 14:40, WebDawg wrote:
I know this is not the XenServer list and I am sorry if this message
rubs anyone the wrong way or I am completely off/ignorant.  I have
never had to do any disk tuning in Xen/XenServer.  I have ran both Xen
by itself and XenServer.  This has come up in my XenServer  instance
and if someone could test in pure Xen that would be great.
Based on what your describing, I can confirm from my production boxes that it doesn't happen in regular Xen, but I'm not entirely convinced it's just a XenServer issue (that is, I think it's likely a Linux Kernel issue). I'll try to help you narrow things down based on that, but you might want to push this upstream to the Linux Kernel Mailing List if you can reproduce it using other distros too (or better yet, with a local build of the mainline kernel from Linus' tree using the config from whatever distro).

I have two forum posts going right now that are right here for this:
*https://bbs.archlinux.org/viewtopic.php?id=218405
*https://discussions.citrix.com/topic/381981-archlinux-hvm-domu-slow-disk-access-100-cpu-xenserver-7/

That no one has replied to.

When I dd to a disk in an Archlinux HVM instance fully up to date with
just the standard linux kernel...I get 100% domU cpu inside of it with
top (dd is at %100).  I also get 100% cpu usage in xentop on dom0.
Just to clarify, is this 100% utilization of one CPU, or 100% utilization of _all_ CPU's involved? In the first case, that points to an issue with how the kernel is dispatching requests, while the second case points elsewhere (I'm not sure exactly where though).

I also get about 2-4MB a second IO.

I can make this go away by doing this:

echo 1 > /sys/block/xvda/queue/add_random

My Debian domU instances have add_random = 1 so that is why I tried it
because they are working ask expected and I was trying to work it out
because I could not find any valid information on the internet that
could help me.

No more CPU use issues and the same speed as the Debian DomU's.

It looks like by default Archlinux is on block-multiqueue and I do not
know if I can go back, because I have not looked harder into going
back/testing that way to fix.  The only reason I think this is because
some of my queue options are not changeable/disabled and I have a mq
directory in all of my devices.  I am getting this information from
here:  https://bugzilla.novell.com/show_bug.cgi?id=911337 so I could
be wrong.
Given what you've said, you probably are using blk-mq.

Reading 
https://wiki.archlinux.org/index.php/Improving_performance#Tuning_IO_schedulers

The Archlinux wiki still talks about enabling the block-multiqueue
layer by using scsi_mod.use_blk_mq=1 but I did not do that so it must
be just enabled now or something?
You should be able to use that same kernel parameter with a value of 0 to disable it. My guess is that the Arch developers toggled the default in the kernel config. If you're using a PV block interface though (unless it's PV-SCSI), you may not be able to turn it off since there doesn't appear to be any switch (or at least I haven't found one) to disable it for the regular Xen PV block interface.

If someone could shed some insight why enabling IO generation/linking
of timing/entropy data to /dev/random makes the 'system work' this
would be great.  Like I said, I am just getting into this and I will
be doing more tuning if I can.
My first thought ignoring the whole entropy bit is that it's a result of the lack of I/O scheduling in blk-mq (real I/O scheduling, not the priority queue FIFO stuff they do right now). WRT the whole add_randomness thing fixing it, I have no idea. I'm kind of surprised at this though, since I've got half a dozen domains running fine with blk-mq getting within 1% of the disk access speed the host sees (and the host is using blk-mq too, both in the device-mapper layer, and the lower block layer). Some info about the rest of the storage stack might be helpful (ie, what type of backing storage are you using for the VM disks (on LVM, MD RAID, flat partitions, flat files, etc), what Xen driver (raw disk, blktap, something else?), and what are you accessing in the VM (raw disk, partition, LVM volume, etc))?

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.