[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 00/49] xen: add core scheduling support
On Fri, 2019-03-29 at 19:16 +0100, Dario Faggioli wrote: > On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote: > > I have done some very basic performance testing: on a 4 cpu system > > (2 cores with 2 threads each) I did a "make -j 4" for building the > > Xen > > hypervisor. With This test has been run on dom0, once with no other > > guest active and once with another guest with 4 vcpus running the > > same > > test. > Just as an heads up for people (as Juergen knows this already :-D), > I'm > planning to run some performance evaluation of this patches. > > I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an > 16 > CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on > which I should be able to get some bench suite running relatively > easy > and (hopefully) quick. > > I'm planning to evaluate: > - vanilla (i.e., without this series), SMT enabled in BIOS > - vanilla (i.e., without this series), SMT disabled in BIOS > - patched (i.e., with this series), granularity=thread > - patched (i.e., with this series), granularity=core > > I'll do start with no overcommitment, and then move to 2x > overcommitment (as you did above). > I've got the first set of results. It's fewer than I wanted/expected to have at this point in time, but still... Also, it's Phoronix again. I don't especially love it, but I'm still working on convincing our own internal automated benchmarking tool (which I like a lot more :-) ) to be a good friend of Xen. :-P It's a not too big set of tests, done in the following conditions: - hardware: Intel Xeon E5620; 2 NUMA nodes, 4 cores and 2 threads each - slow disk (old rotational HDD) - benchmarks run in dom0 - CPU, memory and some disk IO benchmarks - all Spec&Melt mitigations disabled both at Xen and dom0 kernel level - cpufreq governor = performance, max_cstate = C1 - *non* debug hypervisor In just one sentence, what I'd say is "So far so god" :-D https://openbenchmarking.org/result/1904105-SP-1904100DA38 1) 'Xen dom0, SMT On, vanilla' is staging *without* this series even applied 2) 'Xen dom0, SMT on, patched, sched_granularity=thread' is with this series applied, but scheduler behavior as right now 3) 'Xen dom0, SMT on, patched, sched_granularity=core' is with this series applied, and core-scheduling enabled 4) 'Xen dom0, SMT Off, vanilla' is staging *without* this series applied, and SMT turned off in BIOS (i.e., we only have 8 CPUs) So, comparing 1 and 4, we see, for each specific benchmark, what is the cost of disabling SMT (or vice-versa, the gain of using SMT). Comparing 1 and 2, we see the overhead introduced by this series, when it is not used to achieve core-scheduling. Compating 1 and 3, we see the differences with what we have right now, and what we'll have with core-scheduling enabled, as it is implemented in this series. Some of the things we can see from the results: - disabling SMT (i.e., 1 vs 4) is not always bad, but it is bad overall, i.e., if you look at how many tests are better and at how many are slower, with SMT off (and also, by how much). Of course, this can be considered true for these specific benchmarks, on this specific hardware and with this configuration - the overhead introduced by this series is, overall, pretty small, apart from not more than a couple of exceptions (e.g., Stream Triad or zstd compression). OTOH, there seem to be cases where this series improves performance (e.g., Stress-NG Socket Activity) - the performance we achieve with core-scheduling are more than acceptable - between core-scheduling and disabling SMT, core-scheduling wins and I wouldn't even call it a match :-P Of course, other thoughts, comments, alternative analysis are welcome. As said above, this is less that what I wanted to have, and in fact I'm running more stuff. I have a much more comprehensive set of benchmarks running in these days. It being "much more comprehensive", however, also means it takes more time. I have a newer and faster (both CPU and disk) machine, but I need to re-purpose it for benchmarking purposes. At least now that the old Xeon NUMA box is done with this first round, I can use it for: - running the tests inside a "regular" PV domain - running the tests inside more than one PV domain, i.e. with some degree of overcommitment I'll push out results as soon as I have them. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |