|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] 4.2.1: Poor write performance for DomU.
On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
>> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>>>> So, based on my tests yesterday, I decided to break the RAID6 and
>>>> pull a drive out of it to test directly on the 2Tb drives in
>>>> question.
>>>>
>>>> The array in question:
>>>> # cat /proc/mdstat
>>>> Personalities : [raid1] [raid6] [raid5] [raid4]
>>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>>> [4/4] [UUUU]
>>>>
>>>> # mdadm /dev/md2 --fail /dev/sdf
>>>> mdadm: set /dev/sdf faulty in /dev/md2
>>>> # mdadm /dev/md2 --remove /dev/sdf
>>>> mdadm: hot removed /dev/sdf from /dev/md2
>>>>
>>>> So, all tests are to be done on /dev/sdf.
>>>> Model Family: Seagate SV35
>>>> Device Model: ST2000VX000-9YW164
>>>> Serial Number: Z1E17C3X
>>>> LU WWN Device Id: 5 000c50 04e1bc6f0
>>>> Firmware Version: CV13
>>>> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>>>
>>>> From the Dom0:
>>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>>>
>>>> Create a single partition on the drive, and format it with ext4:
>>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>> Disk identifier: 0x98d8baaf
>>>>
>>>> Device Boot Start End Blocks Id System
>>>> /dev/sdf1 2048 3907029167 1953513560 83 Linux
>>>>
>>>> Command (m for help): w
>>>>
>>>> # mkfs.ext4 -j /dev/sdf1
>>>> ......
>>>> Writing inode tables: done
>>>> Creating journal (32768 blocks): done
>>>> Writing superblocks and filesystem accounting information: done
>>>>
>>>> Mount it on the Dom0:
>>>> # mount /dev/sdf1 /mnt/esata/
>>>> # cd /mnt/esata/
>>>> # bonnie++ -d . -u 0:0
>>>> ....
>>>> Version 1.96 ------Sequential Output------ --Sequential
>>>> Input- --Random-
>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>> --Block-- --Seeks--
>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>> %CP /sec %CP
>>>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114
>>>> 17 296.4 6
>>>> Latency 70971us 190ms 221ms 40369us 17657us
>>>> 164ms
>>>>
>>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>>>
>>>> Now, I'll attach the full disk to a DomU:
>>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>>>
>>>> And we'll test from the DomU.
>>>>
>>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>>>
>>>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>>>
>>>> I notice something interesting here. In the Dom0, the device is seen as:
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>
>>>> In the DomU, it is seen as:
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>>>
>>>> Not sure if this could be related - but continuing testing:
>>>> Device Boot Start End Blocks Id System
>>>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
>>>>
>>>> # mkfs.ext4 -j /dev/xvdc1
>>>> ....
>>>> Allocating group tables: done
>>>> Writing inode tables: done
>>>> Creating journal (32768 blocks): done
>>>> Writing superblocks and filesystem accounting information: done
>>>>
>>>> # mount /dev/xvdc1 /mnt/esata/
>>>> # cd /mnt/esata/
>>>> # bonnie++ -d . -u 0:0
>>>> ....
>>>> Version 1.96 ------Sequential Output------ --Sequential
>>>> Input- --Random-
>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>> --Block-- --Seeks--
>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>> %CP /sec %CP
>>>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407
>>>> 23 313.4 9
>>>> Latency 34615us 130ms 128ms 33316us 74401us
>>>> 130ms
>>>>
>>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>>>> from the DomU. More than acceptable.
>>>>
>>>> It leaves me to wonder.... Could there be something in the Dom0
>>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>>>> 512 byte sectors cause an issue?
>>>
>>> There is certain overhead in it. I still have this in my mailbox
>>> so I am not sure whether this issue got ever resolved? I know that the
>>> indirect patches in Xen blkback and xen blkfront are meant to resolve
>>> some of these issues - by being able to carry a bigger payload.
>>>
>>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>>
>> Ok, so I finally got around to building kernel 3.11 RPMs today for
>> testing. I upgraded both the Dom0 and DomU to the same kernel:
>
> Woohoo!
>>
>> DomU:
>> # dmesg | grep blkfront
>> blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
>> indirect descriptors: enabled;
>> blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
>> indirect descriptors: enabled;
>>
>> Looks good.
>>
>> Transfer tests using bonnie++ as per before:
>> # bonnie -d . -u 0:0
>> Version 1.96 ------Sequential Output------ --Sequential Input-
>> --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec %CP
>> zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30
>> 492.3 13
>> Latency 27305us 124ms 158ms 34222us 16865us
>> 374ms
>> Version 1.96 ------Sequential Create------ --------Random
>> Create--------
>> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
>> -Delete--
>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> /sec %CP
>> 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++
>> 18389 31
>> Latency 17775us 154us 180us 16008us 38us
>> 58us
>>
>> Still seems to be a massive discrepancy between Dom0 and DomU write
>> speeds. Interesting is that sequential block reads are nearly 300MB/sec,
>> yet sequential writes were only ~58MB/sec.
>
> OK, so the other thing that people were pointing out that is you
> can use xen-blkfront.max parameter. By default it is 32, but try 8.
> Or 64. Or 256.
Ahh - interesting.
I used the following:
Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM
LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
crashkernel=auto console=hvc0 xen-blkfront.max=X
8:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 696 92 50906 7 46102 11 1013 97 256784 27
496.5 10
Latency 24374us 199ms 117ms 30855us 38008us
85175us
16:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 675 92 58078 8 57585 13 1005 97 262735 25
505.6 10
Latency 24412us 187ms 183ms 23661us 53850us
232ms
32:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 698 92 57416 8 63328 13 1063 97 267154 24
498.2 12
Latency 24264us 199ms 81362us 33144us 22526us
237ms
64:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 574 86 88447 13 68988 17 897 97 265128 27
493.7 13
128:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 702 97 107638 14 70158 15 1045 97 255596 24
491.0 12
Latency 27279us 17553us 134ms 29771us 38392us
65761us
256:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 689 91 102554 14 67337 15 1012 97 262475 24
484.4 12
Latency 20642us 104ms 189ms 36624us 45286us
80023us
So, as a nice summary:
8: 50Mb/sec
16: 58Mb/sec
32: 57Mb/sec
64: 88Mb/sec
128: 107Mb/sec
256: 102Mb/sec
So, maybe it's coincidence, maybe it isn't - but the best (factoring
margin of error) seems to be 128 - which happens to be the block size of
the underlying RAID6 array on the Dom0.
# cat /proc/mdstat
md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0]
3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4]
[UUUU]
> The indirect descriptor allows us to put more I/Os on the ring - and
> I am hoping that will:
> a) solve your problem
Well, it looks like this solves the issue - at least increasing the max
causes almost double the write speed - and no change to read speeds
(within margin of error).
> b) not solve your problem, but demonstrate that the issue is not with
> the ring, but with something else making your writes slower.
>
> Hmm, are you by any chance using O_DIRECT when running bonnie++ in
> dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is
> done to not use the dom0 page cache - otherwise you end up with
> a double buffer where the writes are insane speed - but with absolutly
> no safety.
>
> If you want to try disabling that (so no O_DIRECT), I would do this
> little change:
>
> diff --git a/drivers/block/xen-blkback/blkback.c
> b/drivers/block/xen-blkback/blkback.c
> index bf4b9d2..823b629 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> break;
> case BLKIF_OP_WRITE:
> blkif->st_wr_req++;
> - operation = WRITE_ODIRECT;
> + operation = WRITE;
> break;
> case BLKIF_OP_WRITE_BARRIER:
> drain = true;
With the above results, is this still useful?
--
Steven Haigh
Email: netwiz@xxxxxxxxx
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |