[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Date: Fri, 17 Feb 2017 16:41:05 -0800 (PST)
Cc: george.dunlap@xxxxxxxxxxxxx, edgar.iglesias@xxxxxxxxxx, julien.grall@xxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx
Delivery-date: Sat, 18 Feb 2017 00:41:39 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, 17 Feb 2017, Dario Faggioli wrote:
> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> > These are the results, in nanosec:
> > 
> >                         AVG     MIN     MAX     WARM MAX
> > 
> > NODEBUG no WFI          1890    1800    3170    2070
> > NODEBUG WFI             4850    4810    7030    4980
> > NODEBUG no WFI credit2  2217    2090    3420    2650
> > NODEBUG WFI credit2     8080    7890    10320   8300
> > 
> > DEBUG no WFI            2252    2080    3320    2650
> > DEBUG WFI               6500    6140    8520    8130
> > DEBUG WFI, credit2      8050    7870    10680   8450
> > 
> > As you can see, depending on whether the guest issues a WFI or not
> > while
> > waiting for interrupts, the results change significantly.
> > Interestingly,
> > credit2 does worse than credit1 in this area.
> > 
> I did some measuring myself, on x86, with different tools. So,
> cyclictest is basically something very very similar to the app
> Stefano's app.
> 
> I've run it both within Dom0, and inside a guest. I also run a Xen
> build (in this case, only inside of the guest).
> 
> > We are down to 2000-3000ns. Then, I started investigating the
> > scheduler.
> > I measured how long it takes to run "vcpu_unblock": 1050ns, which is
> > significant. I don't know what is causing the remaining 1000-2000ns,
> > but
> > I bet on another scheduler function. Do you have any suggestions on
> > which one?
> > 
> So, vcpu_unblock() calls vcpu_wake(), which then invokes the
> scheduler's wakeup related functions.
> 
> If you time vcpu_unblock(), from beginning to end of the function, you
> actually capture quite a few things. E.g., the scheduler lock is taken
> inside vcpu_wake(), so you're basically including time spent waited on
> the lock in the estimation.
> 
> That is probably ok (as in, lock contention definitely is something
> relevant to latency), but it is expected for things to be rather
> different between Credit1 and Credit2.
> 
> I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule),
> and here's the result. Numbers are in cycles (I've used RDTSC) and, for
> making sure to obtain consistent and comparable numbers, I've set the
> frequency scaling governor to performance.
> 
> Dom0, [performance]                                                   
>               cyclictest 1us  cyclictest 1ms  cyclictest 100ms                
>         
> (cycles)      Credit1 Credit2 Credit1 Credit2 Credit1 Credit2         
> wakeup-avg    2429    2035    1980    1633    2535    1979            
> wakeup-max    14577   113682  15153   203136  12285   115164          

I am not that familiar with the x86 side of things, but the 113682 and
203136 look worrisome, especially considering that credit1 doesn't have
them.


> sched-avg     1716    1860    2527    1651    2286    1670            
> sched-max     16059   15000   12297   101760  15831   13122           
>                                                               
> VM, [performance]                                                     
>               cyclictest 1us  cyclictest 1ms  cyclictest 100ms make -j xen    
> (cycles)      Credit1 Credit2 Credit1 Credit2 Credit1 Credit2  Credit1 Credit2
> wakeup-avg    2213    2128    1944    2342    2374    2213     2429    1618
> wakeup-max    9990    10104   11262   9927    10290   10218    14430   15108
> sched-avg     2437    2472    1620    1594    2498    1759     2449    1809
> sched-max     14100   14634   10071   9984    10878   8748     16476   14220
> 

These are the corresponding numbers I have in ns:

                                AVG             MAX         WARM MAX
credit2 sched_op do_schedule    638             2410    2290
credit2 sched_op wake           603             2920    670
credit1 sched_op do_schedule    508             980         980
credit1 sched_op wake           792             2080    930

I would also like to see the nop scheduler as a comparison.

It looks like that credit2 has higher max values. I am attaching the raw
numbers because I think they are interesting (also in ns): credit2 has
an higher initial variability. FYI the scenario is still the same: domU
vcpu pinned to a pcpu, dom0 running elsewhere.

Attachment: konsole.credit1.txt
Description: Text document

Attachment: konsole.credit2.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
  - From: George Dunlap

References:
- [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
  - From: Stefano Stabellini
- Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
  - From: Dario Faggioli

Prev by Date: Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
Next by Date: Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
Previous by thread: Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
Next by thread: Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.