[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched

To: WebDawg <webdawg@xxxxxxxxx>, xen-users@xxxxxxxxxxxxx
From: "Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx>
Date: Wed, 19 Oct 2016 15:23:44 -0400
Delivery-date: Wed, 19 Oct 2016 19:24:55 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

On 2016-10-19 14:40, WebDawg wrote:

I know this is not the XenServer list and I am sorry if this message
rubs anyone the wrong way or I am completely off/ignorant.  I have
never had to do any disk tuning in Xen/XenServer.  I have ran both Xen
by itself and XenServer.  This has come up in my XenServer  instance
and if someone could test in pure Xen that would be great.

Based on what your describing, I can confirm from my production boxesthat it doesn't happen in regular Xen, but I'm not entirely convincedit's just a XenServer issue (that is, I think it's likely a Linux Kernelissue). I'll try to help you narrow things down based on that, but youmight want to push this upstream to the Linux Kernel Mailing List if youcan reproduce it using other distros too (or better yet, with a localbuild of the mainline kernel from Linus' tree using the config fromwhatever distro).


I have two forum posts going right now that are right here for this:
*https://bbs.archlinux.org/viewtopic.php?id=218405
*https://discussions.citrix.com/topic/381981-archlinux-hvm-domu-slow-disk-access-100-cpu-xenserver-7/

That no one has replied to.

When I dd to a disk in an Archlinux HVM instance fully up to date with
just the standard linux kernel...I get 100% domU cpu inside of it with
top (dd is at %100).  I also get 100% cpu usage in xentop on dom0.

Just to clarify, is this 100% utilization of one CPU, or 100%utilization of _all_ CPU's involved? In the first case, that points toan issue with how the kernel is dispatching requests, while the secondcase points elsewhere (I'm not sure exactly where though).


I also get about 2-4MB a second IO.

I can make this go away by doing this:

echo 1 > /sys/block/xvda/queue/add_random

My Debian domU instances have add_random = 1 so that is why I tried it
because they are working ask expected and I was trying to work it out
because I could not find any valid information on the internet that
could help me.

No more CPU use issues and the same speed as the Debian DomU's.

It looks like by default Archlinux is on block-multiqueue and I do not
know if I can go back, because I have not looked harder into going
back/testing that way to fix.  The only reason I think this is because
some of my queue options are not changeable/disabled and I have a mq
directory in all of my devices.  I am getting this information from
here:  https://bugzilla.novell.com/show_bug.cgi?id=911337 so I could
be wrong.

Given what you've said, you probably are using blk-mq.


Reading 
https://wiki.archlinux.org/index.php/Improving_performance#Tuning_IO_schedulers

The Archlinux wiki still talks about enabling the block-multiqueue
layer by using scsi_mod.use_blk_mq=1 but I did not do that so it must
be just enabled now or something?

You should be able to use that same kernel parameter with a value of 0to disable it. My guess is that the Arch developers toggled the defaultin the kernel config. If you're using a PV block interface though(unless it's PV-SCSI), you may not be able to turn it off since theredoesn't appear to be any switch (or at least I haven't found one) todisable it for the regular Xen PV block interface.


If someone could shed some insight why enabling IO generation/linking
of timing/entropy data to /dev/random makes the 'system work' this
would be great.  Like I said, I am just getting into this and I will
be doing more tuning if I can.

My first thought ignoring the whole entropy bit is that it's a result ofthe lack of I/O scheduling in blk-mq (real I/O scheduling, not thepriority queue FIFO stuff they do right now). WRT the wholeadd_randomness thing fixing it, I have no idea. I'm kind of surprisedat this though, since I've got half a dozen domains running fine withblk-mq getting within 1% of the disk access speed the host sees (and thehost is using blk-mq too, both in the device-mapper layer, and the lowerblock layer). Some info about the rest of the storage stack might behelpful (ie, what type of backing storage are you using for the VM disks(on LVM, MD RAID, flat partitions, flat files, etc), what Xen driver(raw disk, blktap, something else?), and what are you accessing in theVM (raw disk, partition, LVM volume, etc))?


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched
  - From: WebDawg

References:
- [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched
  - From: WebDawg

Prev by Date: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched
Next by Date: [Xen-users] sending ACPI power-button signal to domU
Previous by thread: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched
Next by thread: Re: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.