[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Poor SMP performance pv_ops domU



I've tried with various kernel's today - pv_ops seems to only use 1 core out of 
8.

PV spinlocks makes no difference.

The thing that sticks out most is I cannot get the dom0 (xen-3.4.2) to show 
more that about 99.7% cpu usage for any pv_ops kernel.

#!/usr/bin/perl

while () {}

running 8 of these loads 2.6.18.8-xenU with nearly 800% cpu as shown in dom0
running the same 8 in any pv_ops kernel's only gets as high as about 99.7%

Inside the pv and xenU kernels top -s show all 8 cores being used.


John

On 18 May 2010, at 19:38, Jeremy Fitzhardinge wrote:

> On 05/18/2010 10:34 AM, John Morrison wrote:
>> Hi,
>> 
>> Over the last year we have tried many times to get acceptable performance 
>> from pv_ops kernels.
>> 
>> Tests done with 1,2,4 and 8 cores. The more cores the lower the score.
>> 
>> Inside the domU it shows all cores, top -s shows all cores in use.
>> xentop in dom0 never shows over 99% cpu.
>> 
>> 2.6.18.8-xenU kernel show's over 700% cpu and the scores are about 8 x the 
>> pv_ops score.
>> 
>> Any ideas ?
>> 
> 
> Well, I guess some kind of bad serialization is going on in there, and
> it should be fairly obvious with a bit of examination.
> 
> Have you tried building your own pvops domu kernels?  Does enabling PV
> spinlocks make any difference?  Also enabling some of the lock
> debugging/profiling/contention monitoring stuff may give useful results.
> 
> Can you post the corresponding 2.6.18 results?  Are there specific
> sub-tests which show the effect more strongly than the others?
> 
> How does the 2.6.32 kernel fare when booted native?
> 
> Thanks,
>    J
> 
>> 
>> John
>> 
>> 
>> 1 core
>> 
>> BYTE UNIX Benchmarks (Version 4.1-wht.2)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066476 132875660   1% /
>> 
>> Start Benchmark Run: Tue May 18 13:54:54 BST 2010
>> 13:54:54 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:06:12 BST 2010
>> 14:06:12 up 11 min,  2 users,  load average: 11.48, 5.20, 2.43
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7  8950813.0      237.6
>> Double-Precision Whetstone                      83.1     2103.7      253.2
>> Execl Throughput                               188.3     1568.4       83.3
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    64198.0      240.3
>> File Copy 256 bufsize 500 maxblocks           1077.0    17781.0      165.1
>> File Read 4096 bufsize 8000 maxblocks        15382.0   643717.0      418.5
>> Pipe-based Context Switching                 15448.6    85379.4       55.3
>> Pipe Throughput                             111814.6   478490.1       42.8
>> Process Creation                               569.3     3329.6       58.5
>> Shell Scripts (8 concurrent)                    44.8      380.7       85.0
>> System Call Overhead                        114433.5   498712.3       43.6
>>                                                                 =========
>>     FINAL SCORE                                                     114.1
>> 
>> 2-cores
>> 
>> ==============================================================
>> BYTE UNIX Benchmarks (Version 4.1-wht.2)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066548 132875588   1% /
>> 
>> Start Benchmark Run: Tue May 18 14:07:27 BST 2010
>> 14:07:27 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:18:04 BST 2010
>> 14:18:04 up 10 min,  1 user,  load average: 12.78, 5.53, 2.49
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7 10124838.6      268.7
>> Double-Precision Whetstone                      83.1     1188.7      143.0
>> Execl Throughput                               188.3     1596.2       84.8
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    58323.0      218.3
>> File Copy 256 bufsize 500 maxblocks           1077.0    17776.0      165.1
>> File Read 4096 bufsize 8000 maxblocks        15382.0   568217.0      369.4
>> Pipe-based Context Switching                 15448.6    86111.3       55.7
>> Pipe Throughput                             111814.6   469957.8       42.0
>> Process Creation                               569.3     3298.1       57.9
>> Shell Scripts (8 concurrent)                    44.8      378.9       84.6
>> System Call Overhead                        114433.5   532828.4       46.6
>>                                                                 =========
>>     FINAL SCORE                                                     107.9
>> 
>> 4-cores
>> 
>> ==============================================================
>> BYTE UNIX Benchmarks (Version 4.1-wht.2)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066628 132875508   1% /
>> 
>> Start Benchmark Run: Tue May 18 14:19:17 BST 2010
>> 14:19:17 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:29:53 BST 2010
>> 14:29:53 up 10 min,  1 user,  load average: 13.59, 6.35, 2.97
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7 10185429.8      270.3
>> Double-Precision Whetstone                      83.1      759.8       91.4
>> Execl Throughput                               188.3     1386.2       73.6
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    62331.0      233.3
>> File Copy 256 bufsize 500 maxblocks           1077.0    16492.0      153.1
>> File Read 4096 bufsize 8000 maxblocks        15382.0   563402.0      366.3
>> Pipe-based Context Switching                 15448.6    87176.0       56.4
>> Pipe Throughput                             111814.6   481068.1       43.0
>> Process Creation                               569.3     3128.9       55.0
>> Shell Scripts (8 concurrent)                    44.8      394.9       88.1
>> System Call Overhead                        114433.5   539996.1       47.2
>>                                                                 =========
>>     FINAL SCORE                                                     102.6
>> 8-cores
>> 
>> ==============================================================
>> BYTE UNIX Benchmarks (Version 4.1-wht.2, 8 threads)
>> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
>> 2010 x86_64 GNU/Linux
>> /dev/xvda1           141110136   1066680 132875456   1% /
>> 
>> Start Benchmark Run: Tue May 18 14:30:59 BST 2010
>> 14:30:59 up 0 min,  1 user,  load average: 0.07, 0.02, 0.00
>> 
>> End Benchmark Run: Tue May 18 14:42:52 BST 2010
>> 14:42:52 up 12 min,  1 user,  load average: 25.56, 10.84, 4.96
>> 
>> 
>>                     INDEX VALUES
>> TEST                                        BASELINE     RESULT      INDEX
>> 
>> Dhrystone 2 using register variables        376783.7  9972130.3      264.7
>> Double-Precision Whetstone                      83.1      755.2       90.9
>> Execl Throughput                               188.3     1584.7       84.2
>> File Copy 1024 bufsize 2000 maxblocks         2672.0    58981.0      220.7
>> File Copy 256 bufsize 500 maxblocks           1077.0    16904.0      157.0
>> File Read 4096 bufsize 8000 maxblocks        15382.0   557735.0      362.6
>> Pipe-based Context Switching                 15448.6    80738.2       52.3
>> Pipe Throughput                             111814.6   450891.2       40.3
>> Process Creation                               569.3     2948.5       51.8
>> Shell Scripts (8 concurrent)                    44.8      378.1       84.4
>> System Call Overhead                        114433.5   537443.2       47.0
>>                                                                 =========
>>     FINAL SCORE                                                     100.9
>> 
>> 
>> 
>> --
>> Professional hosting without compromise
>> www.clustered.net
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>> 
>> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.