[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen IO performance issues


  • To: Juergen Gross <jgross@xxxxxxxx>, marki <list+xenusers@xxxxxxx>, xen-users@xxxxxxxxxxxxxxxxxxxx
  • From: Hans van Kranenburg <hans@xxxxxxxxxxx>
  • Date: Fri, 28 Sep 2018 15:35:16 +0200
  • Autocrypt: addr=hans@xxxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFo2pooBEADwTBe/lrCa78zuhVkmpvuN+pXPWHkYs0LuAgJrOsOKhxLkYXn6Pn7e3xm+ ySfxwtFmqLUMPWujQYF0r5C6DteypL7XvkPP+FPVlQnDIifyEoKq8JZRPsAFt1S87QThYPC3 mjfluLUKVBP21H3ZFUGjcf+hnJSN9d9MuSQmAvtJiLbRTo5DTZZvO/SuQlmafaEQteaOswme DKRcIYj7+FokaW9n90P8agvPZJn50MCKy1D2QZwvw0g2ZMR8yUdtsX6fHTe7Ym+tHIYM3Tsg 2KKgt17NTxIqyttcAIaVRs4+dnQ23J98iFmVHyT+X2Jou+KpHuULES8562QltmkchA7YxZpT mLMZ6TPit+sIocvxFE5dGiT1FMpjM5mOVCNOP+KOup/N7jobCG15haKWtu9k0kPz+trT3NOn gZXecYzBmasSJro60O4bwBayG9ILHNn+v/ZLg/jv33X2MV7oYXf+ustwjXnYUqVmjZkdI/pt 30lcNUxCANvTF861OgvZUR4WoMNK4krXtodBoEImjmT385LATGFt9HnXd1rQ4QzqyMPBk84j roX5NpOzNZrNJiUxj+aUQZcINtbpmvskGpJX0RsfhOh2fxfQ39ZP/0a2C59gBQuVCH6C5qsY rc1qTIpGdPYT+J1S2rY88AvPpr2JHZbiVqeB3jIlwVSmkYeB/QARAQABzR5Kb2hhbm5lcyBN YXJpam4gdmFuIEtyYW5lbmJ1cmfCwZEEEwEKADsCGwMFCwkIBwMFFQoJCAsFFgIDAQACHgEC F4AWIQTib9aPwejUthlFRk7ngVcyGAwqVQUCWjawgAIZAQAKCRDngVcyGAwqVZZ3D/98GzxN iFK38eh60e9TARh4HCgEWHD14/YK6KGpzF5UXM7CkKnb0NDjM3TzeeaIYzsOJITSW6rMOm5L NcJTUmw0x4vt43yc+DFAaBNiywXWgFc6g9RpYg5X33y+jhbjDREsGMDAk89isKWo8I8+rZwl S9FSSopWkrj0wV64TRwAlTCrYaTlS56mHa9T5RJkmIY+suxRr3Xl2gcKvng2Kh2WCDHjItUF /t3DfjMCIEL18QlXieyY2w1K0h4iT93YNkEdSpElsD5lFdt7XUfy3Q89eQHtd5n21cXG9lMc fcSbmHdn0ugYF0Hu2xVKCcYwWEgLjLRJ+G4aLQW122PKVVpn15/n7KMX9hQNMH4T8krEqOpd Vdb982gx5GSa+2j44+kOFTCnREN0w15JZI8Osi48xLdPqcrMVtvq9ga8tIPebAs8IM8Mf4JY okBS5sbCGEWZSSsDSdYm/Fp39HA3AEl2nI+wnJZCdgLx5NEnCd5Ni9d6rzC8Te7SvVvA/qlo sVDZAo6MJBYgoO9lPKHYD0FWomAeOlFVjdob0G2n1xBRjroVK0JQI3jpPQoZpc1TLauUQ+kT BQwWwFlpbfBbf0+CACWiQL0YgNNiZn885h4vU0EQI/FizjWUHxVLhXt1K4+x7nkhCZYzaIFL jLqw4y8f6SF9DxRMTM8dcaIQyThkms7BTQRaOtArARAA50ylThKbq0ACHyomxjQ6nFNxa9IC p6byU9LhhKOax0GB6l4WebMsQLhVGRQ8H7DT84E7QLRYsidEbneB1ciToZkL5YFFaVxY0Hj1 wKxCFcVoCRNtOfoPnHQ5m/eDLaO4o0KKL/kaxZwTn2jnl6BQDGX1Aak0u4KiUlFtoWn/E/NI v5QbTGSwIYuzWqqYBIzFtDbiQRvGw0NuKxAGMhwXy8VP05mmNwRdyh/CC4rWQPBTvTeMwr3n l8/G+16/cn4RNGhDiGTTXcX03qzZ5jZ5N7GLY5JtE6pTpLG+EXn5pAnQ7MvuO19cCbp6Dj8f XRmI0SVXWKSo0A2C8xH6KLCRfUMzD7nvDRU+bAHQmbi5cZBODBZ5yp5CfIL1KUCSoiGOMpMi n3FrarIlcxhNtoE+ya23A+JVtOwtM53ESra9cJL4WPkyk/E3OvNDmh8U6iZXn4ZaKQTHaxN9 yvmAUhZQiQi/sABwxCcQQ2ydRb86Vjcbx+FUr5OoEyQS46gc3KN5yax9D3H9wrptOzkNNMUh Fj0oK0fX/MYDWOFeuNBTYk1uFRJDmHAOp01rrMHRogQAkMBuJDMrMHfolivZw8RKfdPzgiI5 00okLTzHC0wgSSAOyHKGZjYjbEwmxsl3sLJck9IPOKvqQi1DkvpOPFSUeX3LPBIav5UUlXt0 wjbzInUAEQEAAcLBdgQYAQoAIBYhBOJv1o/B6NS2GUVGTueBVzIYDCpVBQJaOtArAhsMAAoJ EOeBVzIYDCpV4kgP+wUh3BDRhuKaZyianKroStgr+LM8FIUwQs3Fc8qKrcDaa35vdT9cocDZ jkaGHprpmlN0OuT2PB+Djt7am2noV6Kv1C8EnCPpyDBCwa7DntGdGcGMjH9w6aR4/ruNRUGS 1aSMw8sRQgpTVWEyzHlnIH92D+k+IhdNG+eJ6o1fc7MeC0gUwMt27Im+TxVxc0JRfniNk8PU Ag4kvJq7z7NLBUcJsIh3hM0WHQH9AYe/mZhQq5oyZTsz4jo/dWFRSlpY7zrDS2TZNYt4cCfZ j1bIdpbfSpRi9M3W/yBF2WOkwYgbkqGnTUvr+3r0LMCH2H7nzENrYxNY2kFmDX9bBvOWsWpc MdOEo99/Iayz5/q2d1rVjYVFRm5U9hG+C7BYvtUOnUvSEBeE4tnJBMakbJPYxWe61yANDQub PsINB10ingzsm553yqEjLTuWOjzdHLpE4lzD416ExCoZy7RLEHNhM1YQSI2RNs8umlDfZM9L ek1+1kgBvT3RH0/CpPJgveWV5xDOKuhD8j5l7FME+t2RWP+gyLid6dE0C7J03ir90PlTEkME HEzyJMPtOhO05Phy+d51WPTo1VSKxhL4bsWddHLfQoXW8RQ388Q69JG4m+JhNH/XvWe3aQFp YP+GZuzOhkMez0lHCaVOOLBSKHkAHh9i0/pH+/3hfEa4NsoHCpyy
  • Delivery-date: Fri, 28 Sep 2018 13:36:18 +0000
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 09/28/2018 10:46 AM, Juergen Gross wrote:
> On 20/09/2018 11:49, marki wrote:
>> Hello,
>>
>> On 2018-09-19 21:43, Hans van Kranenburg wrote:
>>> On 09/19/2018 09:19 PM, marki wrote:
>>>> On 2018-09-19 20:35, Sarah Newman wrote:
>>>>> On 09/14/2018 04:04 AM, marki wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We're having trouble with a dd "benchmark". Even though that probably
>>>>>> doesn't mean much since multiple concurrent jobs using a benckmark
>>>>>> like FIO for
>>>>>> example work ok, I'd like to understand where the bottleneck is / why
>>>>>> this behaves differently.
>>>>>>
>>>>>> Now in a Xen DomU running kernel 4.4 it looks like the following and
>>>>>> speed is low / not what we're used to:
>>>>>>
>>>>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>>>> dm-0              0.00     0.00    0.00  100.00     0.00    99.00 
>>>>>> 2027.52     1.45   14.56    0.00   14.56  10.00 100.00
>>>>>> xvdb              0.00     0.00    0.00 2388.00     0.00    99.44   
>>>>>> 85.28    11.74    4.92    0.00    4.92   0.42  99.20
>>>>>>
>>>>>> # dd if=/dev/zero of=/u01/dd-test-file bs=32k count=250000
>>>>>> 1376059392 bytes (1.4 GB, 1.3 GiB) copied, 7.09965 s, 194 MB/s
>>>
>>> Interesting.
>>>
>>> * Which Xen version are you using?
>>
>> That particular version was XenServer 7.1 LTSR (Citrix). We also tried
>> the newer current release 7.6, makes no difference.
>> Before you start screaming:
>> XS eval licenses do not contain any support so we can't ask them.
>> People in Citrix discussion forums are nice but don't seem to know
>> details necessary to solve this.
>>
>>> * Which Linux kernel version is being used in the dom0?
>>
>> In 7.1 it is "4.4.0+2".
>> In 7.6 that would be "4.4.0+10".
>>
>>> * Is this a PV, HVM or PVH guest?
>>
>> In any case blkfront (and thus blkback) were being used (which seems to
>> transfer data by that ring structure I mentioned and which explains the
>> small block size albeit not necessarily the low queue depth).
>>
>>> * ...more details you can share?
>>
>> Well, not much more except that we are talking about Suse Enterprise
>> Linux 12 up to SP3 in the DomU here. We also tried RHEL 7.5 and the
>> result (slow single-threaded writes) was the same. Reads are not
>> blazingly fast either BTW.
>>
>>>
>>>>>> Note the low queue depth on the LVM device and additionally the low
>>>>>> request size on the virtual disk.
>>>>>>
>>>>>> (As in the ESXi VM there's an LVM layer inside the DomU but it
>>>>>> doesn't matter whether it's there or not.)
>>>>>>
>>>>>>
>>>>>> The above applies to HV + HVPVM modes using kernel 4.4 in the DomU.
>>>
>>> Do you mean PV and PVHVM, instead?
>>>
>>
>> Oups yes, in any case blkfront (and thus blkback) were being used.
>>
>>>
>>> What happens when you use a recent linux kernel in the guest, like 4.18?
>>
>> I'd have to get back to you on that. However, as long as blkback stays
>> the same I'm not sure what would happen.
>> In any case we'd want to stick with the OSes that the XS people support,
>> I'll have to find out if there are some with more recent kernels than
>> SLES or RHEL.
> 
> I have just done a small test for other purposes requiring to do reads
> in a domU using blkfront/blkback. The data was cached in dom0, so the
> only limiting factor was cpu/memory speed and the block ring interface
> of Xen. I was able to transfer 1.8 GB/s on a laptop with a dual core
> i7-4600M CPU @ 2.90GHz.
> 
> So I don't think the ring buffer interface is a real issue here.
> 
> Kernels (in domU and dom0) are 4.19-rc5, Xen is 4.12-unstable.
> 
> Using a standard SLE12-SP2 domU (kernel 4.4.121) with the same dom0
> as in the test before returned the same result.

We also did some testing here, with Xen 4.11 and with Linux 4.17 in dom0
and domU.

Interesting background about optimizations in the past (which OP might
or might not have in its xen/linux):

1) Indirect descriptors

https://blog.xenproject.org/2013/08/07/indirect-descriptors-for-xen-pv-disks/

In linux, this is commit 402b27f9f2c22309d5bb285628765bc27b82fcf5
option got renamed to max_indirect_segments in commit
14e710fe7897e37762512d336ab081c57de579a4

2) Multi-queue support

https://lwn.net/Articles/633391/

We did a mixed random read / random write test with fio, null_blk in the
dom0 and fio in libaio / direct mode directly on the block device, so
that we're only stressing pushing data between dom0 and domU.

Example command:
fio --filename=/dev/xvdc --direct=1 --rw=randrw --ioengine=libaio
--bs=128k --numjobs=8 --iodepth=4 --runtime=20 --group_reporting
--name=max_indirect_segments-$(cat
/sys/module/xen_blkfront/parameters/max_indirect_segments)

-# grep .
/sys/module/xen_blkfront/parameters/*/sys/module/xen_blkfront/parameters/max_indirect_segments:32
/sys/module/xen_blkfront/parameters/max_queues:4
/sys/module/xen_blkfront/parameters/max_ring_page_order:0

max_indirect_segments-32: (groupid=0, jobs=8): err= 0: pid=1756: Fri Sep
28 14:55:47 2018
  read : io=96071MB, bw=4803.3MB/s, iops=38426, runt= 20001msec
  write: io=96232MB, bw=4811.4MB/s, iops=38490, runt= 20001msec

Combined that's almost 10 GB/s...

We tried changing the max_indirect_segments xen_blkfront option from the
default 32 to 64, 128 etc. Every time we tried the thing above with
bs=4k, bs=8k, bs=16k etc...

The outcome of the test is that upping the number for
max_indirect_segments does not change anything, and that the limiting
factor for the test is cpu in the domU (4 vcpu here).

That's interesting by itself, since perf top shows that most of the time
is spent doing xen_hypercall_xen_version... (Why??)

(random sample of live output):

Samples: 2M of event 'cpu-clock', 4000 Hz, Event count (approx.):
44091475561
Overhead  Shared Object             Symbol
  54.26%  [kernel]                  [k] xen_hypercall_xen_version
   8.58%  [kernel]                  [k] xen_hypercall_sched_op
   3.17%  [unknown]                 [.] 0x00007f2c75959717
   2.57%  [unknown]                 [.] 0x00007f2c759596ca
   1.21%  [kernel]                  [k] blk_queue_split
   1.01%  [unknown]                 [.] 0x000056097df10836
   0.60%  [kernel]                  [k] kmem_cache_alloc
   0.57%  [kernel]                  [k] do_io_submit

Adding more vcpu doesn't help, by the way, with 8 vcpu the domU becomes
a bit unresponsive, and I get only around 3371.4MB/s in/out of it.

When using some real disk instead of null_blk, numbers are of course a
lot lower, but yes, it seems the communication between blkback and
blkfront is not really the limiting factor here.

Well, real life workload is of course different than a test... I'm
thinking about trying out a different value for max_indirect_segments in
some places in production for a few days and see if there's any
difference, e.g. see if it helps doing more parallel IO when there's
much higher latency involved for small random reads.

Hans

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.