[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

To: Juergen Gross <jgross@xxxxxxxx>
From: Ingo Molnar <mingo@xxxxxxxxxx>
Date: Sun, 17 May 2015 07:30:36 +0200
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, x86@xxxxxxxxxx, akataria@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, rusty@xxxxxxxxxxxxxxx, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, chrisw@xxxxxxxxxxxx, mingo@xxxxxxxxxx, david.vrabel@xxxxxxxxxx, hpa@xxxxxxxxx, pbonzini@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, boris.ostrovsky@xxxxxxxxxx
Delivery-date: Sun, 17 May 2015 05:30:52 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

* Juergen Gross <jgross@xxxxxxxx> wrote:

> On 05/05/2015 07:21 PM, Jeremy Fitzhardinge wrote:
> >On 05/03/2015 10:55 PM, Juergen Gross wrote:
> >>I did a small measurement of the pure locking functions on bare metal
> >>without and with my patches.
> >>
> >>spin_lock() for the first time (lock and code not in cache) dropped from
> >>about 600 to 500 cycles.
> >>
> >>spin_unlock() for first time dropped from 145 to 87 cycles.
> >>
> >>spin_lock() in a loop dropped from 48 to 45 cycles.
> >>
> >>spin_unlock() in the same loop dropped from 24 to 22 cycles.
> >
> >Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
> >main difference will be whether the branch predictor is warmed up rather
> >than if the lock itself is in dcache, but its much more likely that the
> >lock code is icache if the code is lock intensive, making the cold case
> >moot. But that's pure speculation.
> >
> >Could you see any differences in workloads beyond microbenchmarks?
> >
> >Not that its my call at all, but I think we'd need to see some concrete
> >improvements in real workloads before adding the complexity of more pvops.
> 
> I did another test on a larger machine:
> 
> 25 kernel builds (time make -j 32) on a 32 core machine. Before each
> build "make clean" was called, the first result after boot was omitted
> to avoid disk cache warmup effects.
> 
> System time without my patches: 861.5664 +/- 3.3665 s
>                with my patches: 852.2269 +/- 3.6629 s

So how does the profile look like in the guest, before/after the PV 
spinlock patches? I'm a bit surprised to see so much spinlock 
overhead.

Thanks,

        Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
  - From: Juergen Gross

References:
- Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
  - From: Juergen Gross
- Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
  - From: Jeremy Fitzhardinge
- Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
  - From: Juergen Gross

Prev by Date: [Xen-devel] [xen-unstable test] 56576: regressions - FAIL
Next by Date: [Xen-devel] [linux-3.4 test] 56580: regressions - FAIL
Previous by thread: Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
Next by thread: Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.