[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] XCP 1.1 - poor disk performance, high load and wa


  • To: xen-api@xxxxxxxxxxxxx
  • From: SpamMePlease PleasePlease <spankthespam@xxxxxxxxx>
  • Date: Fri, 21 Sep 2012 15:18:16 +0100
  • Delivery-date: Fri, 21 Sep 2012 14:18:29 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

Actually, I've reinstalled the OS without md raid (despite the fact I
have this configuration working perfectly fine on another server) and
I still have extremely poor performance when importing vm's:

* the process takes over hour for a vm exported in ~3 minutes
* top looks like that, when importing the vm to fresh, empty (no
running vm's) system:

top - 16:09:40 up  1:49,  3 users,  load average: 1.36, 1.44, 1.38
Tasks: 134 total,   1 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.0%sy,  0.0%ni, 60.7%id, 39.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    771328k total,   763052k used,     8276k free,   269656k buffers
Swap:   524280k total,        0k used,   524280k free,   305472k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6433 root      20   0  217m  25m 8348 S  0.7  3.4   0:35.63 xapi
   92 root      20   0     0    0    0 S  0.3  0.0   0:00.02 bdi-default
11775 root      20   0  2036 1072  584 S  0.3  0.1   0:01.63 xe
15058 root      20   0  2424 1120  832 S  0.3  0.1   0:06.77 top
17127 root      20   0  2424 1108  828 R  0.3  0.1   0:00.13 top

* the iostat looks like that:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.15    0.00    0.20   44.85    0.05   54.75

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.40  2.40  2.40  1864.00   575.60   508.25
    2.97  649.58 208.33 100.00
sda1              0.00     0.00  2.40  0.40  1864.00    11.20   669.71
    0.82  346.43 281.43  78.80
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.00     0.40  0.00  2.00     0.00   564.40   282.20
    2.14 1074.00 500.00 100.00
sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
dm-0              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  0.00  2.20     0.00   759.20   345.09
    2.57  976.36 454.55 100.00
tda               0.00     0.00  1.00  8.60     8.00   756.80    79.67
   16.00 1546.04 104.17 100.00
xvda              0.00   130.00  1.00  8.60     8.00   756.80    79.67
  148.83 10160.42 104.17 100.00

* smartclt shows both disks to be in perfect health
* hdparm reports decent speeds on raw sda/sdb devices (~160Mb/s)
* I was pointed out that the drives are new 4k sector size ones, and
I've modified install.img .py files to accomodate that change in few
places, and will try to reinstall the machine afterwards

Any clue?
S.

On Fri, Sep 21, 2012 at 3:17 PM, SpamMePlease PleasePlease
<spankthespam@xxxxxxxxx> wrote:
> Actually, I've reinstalled the OS without md raid (despite the fact I
> have this configuration working perfectly fine on another server) and
> I still have extremely poor performance when importing vm's:
>
> * the process takes over hour for a vm exported in ~3 minutes
> * top looks like that, when importing the vm to fresh, empty (no
> running vm's) system:
>
> top - 16:09:40 up  1:49,  3 users,  load average: 1.36, 1.44, 1.38
> Tasks: 134 total,   1 running, 133 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.3%us,  0.0%sy,  0.0%ni, 60.7%id, 39.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:    771328k total,   763052k used,     8276k free,   269656k buffers
> Swap:   524280k total,        0k used,   524280k free,   305472k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  6433 root      20   0  217m  25m 8348 S  0.7  3.4   0:35.63 xapi
>    92 root      20   0     0    0    0 S  0.3  0.0   0:00.02 bdi-default
> 11775 root      20   0  2036 1072  584 S  0.3  0.1   0:01.63 xe
> 15058 root      20   0  2424 1120  832 S  0.3  0.1   0:06.77 top
> 17127 root      20   0  2424 1108  828 R  0.3  0.1   0:00.13 top
>
> * the iostat looks like that:
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.15    0.00    0.20   44.85    0.05   54.75
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00     0.40  2.40  2.40  1864.00   575.60   508.25
>     2.97  649.58 208.33 100.00
> sda1              0.00     0.00  2.40  0.40  1864.00    11.20   669.71
>     0.82  346.43 281.43  78.80
> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> sda3              0.00     0.40  0.00  2.00     0.00   564.40   282.20
>     2.14 1074.00 500.00 100.00
> sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> dm-0              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> dm-1              0.00     0.00  0.00  2.20     0.00   759.20   345.09
>     2.57  976.36 454.55 100.00
> tda               0.00     0.00  1.00  8.60     8.00   756.80    79.67
>    16.00 1546.04 104.17 100.00
> xvda              0.00   130.00  1.00  8.60     8.00   756.80    79.67
>   148.83 10160.42 104.17 100.00
>
> * smartclt shows both disks to be in perfect health
> * hdparm reports decent speeds on raw sda/sdb devices (~160Mb/s)
> * I was pointed out that the drives are new 4k sector size ones, and
> I've modified install.img .py files to accomodate that change in few
> places, and will try to reinstall the machine afterwards
>
> Any clue?
> S.
>
> On Mon, Sep 17, 2012 at 2:44 PM, Denis Cardon
> <denis.cardon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi George,
>>
>>> By default XCP 1.1. does not support software raid.
>>>
>>> Under certain condition you can use is, but you need to know you
>>> going
>>> do deep water. And it's better to know how to swim... I mean,
>>> understand
>>> internals of XCP.
>>>
>>> If you are 'just a user' - do not use software raid.
>>>
>>> If you wish some help here -  say which device (virtual or real) is
>>> bottleneck.
>>
>> first thank you for the dedicated time you spend on the xcp mailing list. It 
>> is definitly a big advantage for the project to have a dynamic mailing list.
>>
>> I've come across the slow md raid5 issue with a XCP 1.1 a month ago, and 
>> didn't have much time to look into it since it is a non production small 
>> dev/test server and I'm using hardware raid on production servers.
>>
>> Like the initial poster's mail, on my dev server io write access goes to 
>> hell and loadavg skyrockets even with very light disk write. However the 
>> behavior is not consistent with a standard io staturation since parallel io 
>> access are not much affected... That is to say that I can launch a "iozone 
>> -a -n512m -g512m" on a 256MB VM and at the same time a "find /" still goes 
>> thought quite smoothly... Using vmstat on dom0 sometime I saw up to 60k 
>> blocks per seconds (4k block I guess) so thoughput seems to be acceptable 
>> some time. Note : I'm activated cache on the SATA disk (I know, bad idea for 
>> production but ok for me for dev). I've not experienced such behavior with 
>> installation with hardware RAID.
>>
>> If I have some time this week, I'll try to convert the setup with ext3 
>> partition (thin provisioning) to be able to make iozone directly on the SR. 
>> I know md soft RAID is not a supported configuration, and I agree it should 
>> not be unless the sata disk cache is deactivate (which gives bad 
>> performance). However in some very small setup or dev setup, it might still 
>> be interesting.
>>
>> Cheers and keep on the good work!
>>
>> Denis
>>
>>>
>>>
>>> On 16.09.2012 20:43, SpamMePlease PleasePlease wrote:
>>> > All,
>>> >
>>> > I've installed XCP 1.1 from latest available ISO on Hetzner's 4S
>>> > server with usage of md raid 1 and LVM for local storage
>>> > repository.
>>> > The problem Im seeing is extremely poor disk performance -
>>> > importing
>>> > vm file that was exported in ~3m from another XCP 1.1 (also
>>> > Hetzner's
>>> > server, but bit older one, EQ6) takes up to 2 hours, and in
>>> > meantime
>>> > the dom0 becomes almost unusable, the load goes up to 2, it has
>>> > constant 50% (or higher) of wa(it) and is extremely sluggish.
>>> >
>>> > Now, I wouldnt mind the sluggishness of dom0, but 2h for vm import
>>> > seems crazy and unacceptable. I've made multiple installations of
>>> > the
>>> > server to make sure I am not doing anything wrong, but the same
>>> > setup
>>> > works flawless on older machine. I've tested the drives and they
>>> > seem
>>> > to be fine, with up to ~170mb/s throughtput on SATA3.
>>> >
>>> > Is there anything else I can check to see if its hardware problem,
>>> > or
>>> > anything that could be configured on dom0 to make it operational
>>> > and
>>> > usable?
>>> >
>>> > Kind regards,
>>> > S.
>>> >
>>> > _______________________________________________
>>> > Xen-api mailing list
>>> > Xen-api@xxxxxxxxxxxxx
>>> > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
>>>
>>> _______________________________________________
>>> Xen-api mailing list
>>> Xen-api@xxxxxxxxxxxxx
>>> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
>>>
>>
>> _______________________________________________
>> Xen-api mailing list
>> Xen-api@xxxxxxxxxxxxx
>> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.