[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] More benchmarks with flatten topology in the Linux kernel



Hi everyone,

I managed running again the benchmarks I had already showed off here:

 [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
 https://lkml.org/lkml/2015/8/18/302

Basically, this is about Linux guests using topology information for
scheduling, while they just don't make any sense when on Xen as (unless
static and guest-lifetime long pinning is used) vCPUs do move around!

Some more context is also available here:

 http://lists.xen.org/archives/html/xen-devel/2015-07/msg03241.html

This email is still about numbers obtained by running things in Dom0,
and without overloading the host pCPUs at the Xen level (i.e., I'm
using nr. dom0 vCPUs == nr. host pCPUs).

With respect to previous round:
 - I've added results for hackbench
 - I've run the benches with both my patch[0] and Juergen's patch[1]. 
   My patch is 'dariof', in the spreadsheet; Juergen's is 'jgross'.

Here are the numbers:

 
https://docs.google.com/spreadsheets/d/17djcVV3FkmHmv1FKFBe9CQFnNgVumnM2U64MNvjzAn8/edit?usp=sharing

(If anyone has issues with googledocs, tell me, and I'll try
cutting-&-pasting in email, as I did the other time.)

A few comments:
 * both the patches bring performance improvements. The only 
   regression seems to happen in hackbench, when running with -g1. 
   That is certainly not the typical use case of the benchmark, but we 
   certainly can try figuring out better what happens in that case;
 * the two patches were supposed to provide almost identical results, 
   and they actually do that, in most cases (e.g., all the instances 
   of Unixbench);
 * when there are differences, it is hard to see a trend, or, in 
   general, to identify a possible reason by looking at differences 
   between the patches themselves, at least as far as these data are 
   concerned. In fact, in the "make xen" case, for instance, 'jgross'
   is better when building with -j20 and -j24, while 'dariof' is
   better when building with -j48 and -j62 (the host having 48 pCPUs).
   In the hackbench case, 'dariof' is better in the least concurrent
   case, 'jgross' is better in the other three.
   This all may well be due to some different and independent 
   factor... Perhaps, a little bit more of investigation is necessary 
   (and I'm up for it).

IMO, future steps are:
 a) running benchmarks in a guest
 b) running benchmarks in more guests, and when overloading at the Xen 
    level (i.e., having more vCPUs around than the host has pCPUs)
 c) tracing and/or collecting stats (e.g., from perf and xenalyze)

I'm already working on a) and b).

As far as which approach (mine or Juergen's) to adopt, I'm not sure,
and it does not seem to make much difference, at least from the
performance point of view. I don't have any particular issues with
Juergen's patch, apart from the fact that I'm not yet sure how it makes
the scheduling domains creation code behave. I can look into that and
report.

Also, this is all for PV guests. Any thoughts on what the best route
would be for HVM ones?

[0] http://pastebin.com/KF5WyPKz
[1] http://pastebin.com/xSFLbLwn

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.