[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH 00/19] xen: sched: assorted fixes and improvements to Credit2
Hi everyone, Here you go a collection of pseudo-random fixes and improvement to Credit2. In the process of working on Soft Affinity and Caps support, I stumbled upon them, one after the other, and decided to take care. It's been hard to test and run benchmark, due to the "time goes backwards" bug I uncovered [1], and this is at least part of the reason why the code for affinity and caps is still missing. I've got it already, but need to refine a couple of things, after double checking benchmark results. So, now that we have Jan's series [2] (thanks! [*]), and that I managed to indeed run some tests on this preliminary set of patches, I decided I better set this first group free, while working on finishing the rest. The various patches do a wide range of different things, so, please, refer to individual changelogs for more detailed explanation. About the numbers I could collect so far, here's the situation. I've run rather simple benchmarks such as: - Xen build inside a VM. Metric is how log that takes (in seconds), so lower is better. - Iperf from a VM to its host. Metric is total aggregate throughput, so higher is better. The host is a 16 pCPUs / 2 NUMA nodes Xeon E5620, 6GB RAM per node. The VM had 16 vCPUs and 4GB of memory. Dom0 had 16 vCPUs as well, and 1GB of RAM. The Xen build, I did it one time with -j4 --representative of low VM load-- and another time with -j24 --representative of high VM laod. The Iperf test, I've only used 8 parallel streams (I wanted to do 4 and 8, but there was a bug in my scripts! :-/). I've run the above both with and without disturbing external (from the point of view of the VM) load. Such load were just generated by means of running processes in dom0. It's rather basic, but it certainly keeps dom0's vCPUs busy and stress the scheduler. This "noise", when present, was composed by: - 8 (v)CPU hog process (`yes &> /dev/null'), running in dom0 - 4 processes alternating computation and sleep with a duty cycle of 35%. So, there basically were 12 vCPUs of dom0 kept busy, in an heterogeneous fashion. I benchmarked Credit2 with runqueues arranged per-core (the current default) and per-socket, and also Credit1, for reference. The baseline was current staging plus Jan's monotonicity series. Actual numbers: |=======================================================================| | CREDIT 1 (for reference) | |=======================================================================| | Xen build, low VM load, no noise | |-------------------------------------| | 32.207 | |-------------------------------------|---------------------------------| | Xen build, high VM load, no noise | Iperf, high VM load, no noise | |-------------------------------------|---------------------------------| | 18.500 | 22.633 | |-------------------------------------|---------------------------------| | Xen build, low VM load, with noise | |-------------------------------------| | 38.700 | |-------------------------------------|---------------------------------| | Xen build, high VM load, with noise | Iperf, high VM load, with noise | |-------------------------------------|---------------------------------| | 80.317 | 21.300 |=======================================================================| | CREDIT 2 | |=======================================================================| | Xen build, low VM load, no noise | |-------------------------------------| | runq=core runq=socket | | baseline 34.543 38.070 | | patched 35.200 33.433 | |-------------------------------------|---------------------------------| | Xen build, high VM load, no noise | Iperf, high VM load, no noise | |-------------------------------------|---------------------------------| | runq=core runq=socket | runq=core runq=socket | | baseline 18.710 19.397 | baseline 21.300 21.933 | | patched 18.013 18.530 | patched 23.200 23.466 | |-------------------------------------|---------------------------------| | Xen build, low VM load, with noise | |-------------------------------------| | runq=core runq=socket | | baseline 44.483 40.747 | | patched 45.866 39.493 | |-------------------------------------|---------------------------------| | Xen build, high VM load, with noise | Iperf, high VM load, with noise | |-------------------------------------|---------------------------------| | runq=core runq=socket | runq=core runq=socket | | baseline 41.466 30.630 | baseline 20.333 20.633 | | patched 36.840 29.080 | patched 19.967 21.000 | |=======================================================================| Which, summarizing, means: * as far as Credit2 is concerned, applying this series and using runq=socket is what _ALWAYS_ provides the best results. * when looking at Credit1 vs. patched Credit2 with runq=socket: - Xen build, low VM load, no noise : Credit1 slightly better - Xen build, low VM load, no noise : on par - Xen build, low VM load, with noise: Credit1 a bit better - Xen build, high VM load, with noise: Credit2 _ENORMOUSLY_ better (yes, I rerun both cases a number of time!) - Iperf, high VM load, no noise : Credit2 a bit better - Iperf, high VM load, with noise: Credit1 slightly better So, Credit1 still wins a few rounds, but performance are very very very close, and this series seems to me to help narrowing the gap (for some of the cases, significantly). It also looks like that, although rather naive, the 'Xen build, high VM load, with noise' test case exposed another of those issues with Credit1 (more investigation is necessary), while Credit2 keeps up just fine. Another interesting thing to note is that, on Credit2 (with this series) 'Xen build, high VM load, with noise' turns out being quicker than 'Xen build, low VM load, with noise'. This means that using an higher value for `make -j' for a build, inside a guest, results in quicker build time, which makes sense... But that is _NOT_ what happens on Credit1, the whole thing (wildly :-P) hinting at Credit2 being able to achieve better scalability and better fairness. In any case, more benchmarking is necessary, and is already planned. More investigation is also necessary to figure out whether, once we will have this series, going back to runq=socket as default would indeed be the best thing (which I indeed suspect it will). But from all I see, and from all the various perspectives, this series seems a step in the right direction. Thanks and Regards, Dario [1] http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html [2] http://lists.xen.org/archives/html/xen-devel/2016-06/msg01884.html [*] Jan, I confirm that, with your series applied, I haven't yet seen any of those "Time went backwards?" printk from Credit2, as you sort of were expecting... --- Dario Faggioli (19): xen: sched: leave CPUs doing tasklet work alone. xen: sched: make the 'tickled' perf counter clearer xen: credit2: insert and tickle don't need a cpu parameter xen: credit2: kill useless helper function choose_cpu xen: credit2: do not warn if calling burn_credits more than once xen: credit2: read NOW() with the proper runq lock held xen: credit2: prevent load balancing to go mad if time goes backwards xen: credit2: when tickling, check idle cpus first xen: credit2: avoid calling __update_svc_load() multiple times on the same vcpu xen: credit2: rework load tracking logic tools: tracing: adapt Credit2 load tracking events to new format xen: credit2: use non-atomic cpumask and bit operations xen: credit2: make the code less experimental xen: credit2: add yet some more tracing xen: credit2: only marshall trace point arguments if tracing enabled tools: tracing: deal with new Credit2 events xen: credit2: the private scheduler lock can be an rwlock. xen: credit2: implement SMT support independent runq arrangement xen: credit2: use cpumask_first instead of cpumask_any when choosing cpu docs/misc/xen-command-line.markdown | 30 + tools/xentrace/formats | 10 tools/xentrace/xenalyze.c | 103 +++ xen/common/sched_credit.c | 22 - xen/common/sched_credit2.c | 1158 +++++++++++++++++++++++++---------- xen/common/sched_rt.c | 8 xen/include/xen/cpumask.h | 8 xen/include/xen/perfc_defn.h | 5 8 files changed, 973 insertions(+), 371 deletions(-) -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |