[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead
On Fri, 17 Feb 2017, Dario Faggioli wrote: > On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > > These are the results, in nanosec: > > > > AVG MIN MAX WARM MAX > > > > NODEBUG no WFI 1890 1800 3170 2070 > > NODEBUG WFI 4850 4810 7030 4980 > > NODEBUG no WFI credit2 2217 2090 3420 2650 > > NODEBUG WFI credit2 8080 7890 10320 8300 > > > > DEBUG no WFI 2252 2080 3320 2650 > > DEBUG WFI 6500 6140 8520 8130 > > DEBUG WFI, credit2 8050 7870 10680 8450 > > > > As you can see, depending on whether the guest issues a WFI or not > > while > > waiting for interrupts, the results change significantly. > > Interestingly, > > credit2 does worse than credit1 in this area. > > > I did some measuring myself, on x86, with different tools. So, > cyclictest is basically something very very similar to the app > Stefano's app. > > I've run it both within Dom0, and inside a guest. I also run a Xen > build (in this case, only inside of the guest). > > > We are down to 2000-3000ns. Then, I started investigating the > > scheduler. > > I measured how long it takes to run "vcpu_unblock": 1050ns, which is > > significant. I don't know what is causing the remaining 1000-2000ns, > > but > > I bet on another scheduler function. Do you have any suggestions on > > which one? > > > So, vcpu_unblock() calls vcpu_wake(), which then invokes the > scheduler's wakeup related functions. > > If you time vcpu_unblock(), from beginning to end of the function, you > actually capture quite a few things. E.g., the scheduler lock is taken > inside vcpu_wake(), so you're basically including time spent waited on > the lock in the estimation. > > That is probably ok (as in, lock contention definitely is something > relevant to latency), but it is expected for things to be rather > different between Credit1 and Credit2. > > I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule), > and here's the result. Numbers are in cycles (I've used RDTSC) and, for > making sure to obtain consistent and comparable numbers, I've set the > frequency scaling governor to performance. > > Dom0, [performance] > cyclictest 1us cyclictest 1ms cyclictest 100ms > > (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 > wakeup-avg 2429 2035 1980 1633 2535 1979 > wakeup-max 14577 113682 15153 203136 12285 115164 I am not that familiar with the x86 side of things, but the 113682 and 203136 look worrisome, especially considering that credit1 doesn't have them. > sched-avg 1716 1860 2527 1651 2286 1670 > sched-max 16059 15000 12297 101760 15831 13122 > > VM, [performance] > cyclictest 1us cyclictest 1ms cyclictest 100ms make -j xen > (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 > wakeup-avg 2213 2128 1944 2342 2374 2213 2429 1618 > wakeup-max 9990 10104 11262 9927 10290 10218 14430 15108 > sched-avg 2437 2472 1620 1594 2498 1759 2449 1809 > sched-max 14100 14634 10071 9984 10878 8748 16476 14220 > These are the corresponding numbers I have in ns: AVG MAX WARM MAX credit2 sched_op do_schedule 638 2410 2290 credit2 sched_op wake 603 2920 670 credit1 sched_op do_schedule 508 980 980 credit1 sched_op wake 792 2080 930 I would also like to see the nop scheduler as a comparison. It looks like that credit2 has higher max values. I am attaching the raw numbers because I think they are interesting (also in ns): credit2 has an higher initial variability. FYI the scenario is still the same: domU vcpu pinned to a pcpu, dom0 running elsewhere. Attachment:
konsole.credit1.txt Attachment:
konsole.credit2.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |