[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Virt overehead with HT [was: Re: Xen 4.5 development update]



On 07/14/2014 06:22 PM, Dario Faggioli wrote:
On Mon, 2014-07-14 at 17:55 +0100, George Dunlap wrote:
On 07/14/2014 05:44 PM, Dario Faggioli wrote:
On Mon, 2014-07-14 at 17:32 +0100, Gordan Bobic wrote:
On 07/14/2014 05:12 PM, Dario Faggioli wrote:
Elapsed(stddev)   BAREMETAL             HVM
kernbench -j4     31.604 (0.0963328)    34.078 (0.168582)
kernbench -j8     26.586 (0.145705)     26.672 (0.0432435)
kernbench -j      27.358 (0.440307)     27.49 (0.364897)

With HT disabled in BIOS (which means only 4 CPUs for both):
Elapsed(stddev)   BAREMETAL             HVM
kernbench -j4     57.754 (0.0642651)    56.46 (0.0578792)
kernbench -j8     31.228 (0.0775887)    31.362 (0.210998)
kernbench -j      32.316 (0.0270185)    33.084 (0.600442)
BTW, there's a mistake here. The three runs, in the no-HT case are as
follows:
   kernbench -j2
   kernbench -j4
   kernbench -j

I.e., half the number of VCPUs, as much as there are VCPUs and
unlimited, exactly as for the HT case.

Ah -- that's a pretty critical piece of information.

So actually, on native, HT enabled and disabled effectively produce the
same exact thing if HT is not actually being used:  31 seconds in both
cases.  But on Xen, enabling HT when it's not being used (i.e., when in
theory each core should have exactly one process running), performance
goes from 31 seconds to 34 seconds -- roughly a 10% degradation.

Yes. 7.96% degradation, to be precise.

I attempted an analysis in my first e-mail. Cutting and pasting it
here... What do you think?

"I guess I can investigate a bit more about what happens with '-j4'.
  What I suspect is that the scheduler may make a few non-optimal
  decisions wrt HT, when there are more PCPUs than busy guest VCPUs. This
  may be due to the fact that Dom0 (or another guest VCPU doing other
  stuff than kernbench) may be already running on PCPUs that are on
  different cores than the guest's one (i.e., the guest VCPUs that wants
  to run kernbench), and that may force two guest's vCPUs to execute on
  two HTs some of the time (which of course is something that does not
  happen on baremetal!)."

I just re-run the benchmark with credit2, which has no SMT knowledge,
and the first run (the one that does not use HT)  ended up to be 37.54,
while the other two were pretty much the same of above (26.81 and
27.92).

This confirms, for me, that it's an SMT balancing issue that we're seen.

I'll try more runs, e.g. with number of VCPUs equal less than
nr_corse/2 and see what happens.

Again, thoughts?

Have you tried it with VCPUs pinned to appropriate PCPUs?



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.