[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough



On Thu, Nov 18, 2010 at 4:20 PM, Dan Magenheimer
<dan.magenheimer@xxxxxxxxxx> wrote:
>> We did suspect it, since our old setting was HZ=1000 and we assigned
>> more than 10 VCPUs to domU. But we don't see the performance difference
>> with HZ=100.
>
> FWIW, it didn't appear that the problems were proportional to HZ.
> Seemed more that somehow the pvclock became incorrect and spent
> a lot of time rereading the pvclock value.

We decided to enable lock stat in the kernel to track down all those
lock activities in the profile report. The first thing I noticed was
kmemleak was at the top of the list (/proc/lock_stat) so we disabled
kmemleak. This boosted our I/O performance to 119k IOPS (from 31k).

One of our developers (Bruce Edge) suggested killing ntpd so I did.
This resulted in another significant bump in I/O performance to 209k
IOPS. The question now is why ntpd? Is it the source of all or most of
those pvclock_clocksource_read in the profile report?

>
>> -----Original Message-----
>> From: Lin, Ray [mailto:Ray.Lin@xxxxxxx]
>> Sent: Thursday, November 18, 2010 2:40 PM
>> To: Dan Magenheimer; Dante Cinco; Konrad Wilk
>> Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@xxxxxxxxxx;
>> Andrew Thomas; keir.fraser@xxxxxxxxxxxxx; Chris Mason
>> Subject: RE: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2
>> pvops domU kernel with PCI passthrough
>>
>>
>>
>> -----Original Message-----
>> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
>> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Dan Magenheimer
>> Sent: Thursday, November 18, 2010 1:21 PM
>> To: Dante Cinco; Konrad Wilk
>> Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@xxxxxxxxxx;
>> Andrew Thomas; keir.fraser@xxxxxxxxxxxxx; Chris Mason
>> Subject: RE: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2
>> pvops domU kernel with PCI passthrough
>>
>> In case it is related:
>> http://lists.xensource.com/archives/html/xen-devel/2010-
>> 07/msg01247.html
>>
>> Although I never went further on this investigation, it appeared to me
>> that pvclock_clocksource_read was getting called at least an order-of-
>> magnitude more frequently than expected in some circumstances for some
>> kernels.  And IIRC it was scaled by the number of vcpus.
>>
>> We did suspect it, since our old setting was HZ=1000 and we assigned
>> more than 10 VCPUs to domU. But we don't see the performance difference
>> with HZ=100.
>>
>> > -----Original Message-----
>> > From: Dante Cinco [mailto:dantecinco@xxxxxxxxx]
>> > Sent: Thursday, November 18, 2010 12:36 PM
>> > To: Konrad Rzeszutek Wilk
>> > Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@xxxxxxxxxx;
>> > Andrew Thomas; keir.fraser@xxxxxxxxxxxxx; Chris Mason
>> > Subject: Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2
>> > pvops domU kernel with PCI passthrough
>> >
>> > I mentioned earlier in an previous post to this thread that I'm able
>> > to apply Dulloor's xenoprofile patch to the dom0 kernel but not the
>> > domU kernel. So I can't do active-domain profiling but I'm able to do
>> > passive-domain profiling but I don't know how reliable the results
>> are
>> > since it shows pvclock_clocksource_read as the top consumer of CPU
>> > cycles at 28%.
>> >
>> > CPU: Intel Architectural Perfmon, speed 2665.98 MHz (estimated)
>> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
>> > unit mask of 0x00 (No unit mask) count 100000
>> > samples  %        image name               app name
>> > symbol name
>> > 918089   27.9310
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           pvclock_clocksource_read
>> > 217811    6.6265  domain1-modules          domain1-modules
>> > /domain1-modules
>> > 188327    5.7295  vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-
>> debug
>> > vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug
>> > mutex_spin_on_owner
>> > 186684    5.6795
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           __xen_spin_lock
>> > 149514    4.5487
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           __write_lock_failed
>> > 123278    3.7505
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           __kernel_text_address
>> > 122906    3.7392
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           xen_spin_unlock
>> > 90903     2.7655
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           __spin_time_accum
>> > 85880     2.6127
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           __module_address
>> > 75223     2.2885
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           print_context_stack
>> > 66778     2.0316
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           __module_text_address
>> > 57389     1.7459
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           is_module_text_address
>> > 47282     1.4385  xen-syms-4.1-unstable    domain1-xen
>> > syscall_enter
>> > 47219     1.4365
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           prio_tree_insert
>> > 46495     1.4145  vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-
>> debug
>> > vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug
>> > pvclock_clocksource_read
>> > 44501     1.3539
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           prio_tree_left
>> > 32482     0.9882
>> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug
>> > domain1-kernel           native_read_tsc
>> >
>> > I ran oprofile (0.9.5 with xenoprofile patch) for 20 seconds while
>> the
>> > I/Os were running. Here's the command I used:
>> >
>> > opcontrol --start --xen=/boot/xen-syms-4.1-unstable
>> > --vmlinux=/boot/vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug
>> > --passive-domains=1
>> > --passive-images=/boot/vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-
>> > 5.11.dcinco-debug
>> >
>> > I had to remove dom0_max_vcpus=1 (but kept dom0_vcpus_pin=true) in
>> the
>> > Xen command line. Otherwise, oprofile only gives the samples from
>> > CPU0.
>> >
>> > I'm going to try perf next.
>> >
>> > - Dante
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen-devel
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.