[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Virt overehead with HT [was: Re: Xen 4.5 development update]



On Mon, 2014-07-14 at 17:55 +0100, George Dunlap wrote:
> On 07/14/2014 05:44 PM, Dario Faggioli wrote:
> > On Mon, 2014-07-14 at 17:32 +0100, Gordan Bobic wrote:
> >> On 07/14/2014 05:12 PM, Dario Faggioli wrote:
> >>> Elapsed(stddev)   BAREMETAL             HVM
> >>> kernbench -j4     31.604 (0.0963328)    34.078 (0.168582)
> >>> kernbench -j8     26.586 (0.145705)     26.672 (0.0432435)
> >>> kernbench -j      27.358 (0.440307)     27.49 (0.364897)
> >>>
> >>> With HT disabled in BIOS (which means only 4 CPUs for both):
> >>> Elapsed(stddev)   BAREMETAL             HVM
> >>> kernbench -j4     57.754 (0.0642651)    56.46 (0.0578792)
> >>> kernbench -j8     31.228 (0.0775887)    31.362 (0.210998)
> >>> kernbench -j      32.316 (0.0270185)    33.084 (0.600442)
> > BTW, there's a mistake here. The three runs, in the no-HT case are as
> > follows:
> >   kernbench -j2
> >   kernbench -j4
> >   kernbench -j
> >
> > I.e., half the number of VCPUs, as much as there are VCPUs and
> > unlimited, exactly as for the HT case.
> 
> Ah -- that's a pretty critical piece of information.
> 
> So actually, on native, HT enabled and disabled effectively produce the 
> same exact thing if HT is not actually being used:  31 seconds in both 
> cases.  But on Xen, enabling HT when it's not being used (i.e., when in 
> theory each core should have exactly one process running), performance 
> goes from 31 seconds to 34 seconds -- roughly a 10% degradation.
> 
Yes. 7.96% degradation, to be precise.

I attempted an analysis in my first e-mail. Cutting and pasting it
here... What do you think?

"I guess I can investigate a bit more about what happens with '-j4'.
 What I suspect is that the scheduler may make a few non-optimal
 decisions wrt HT, when there are more PCPUs than busy guest VCPUs. This
 may be due to the fact that Dom0 (or another guest VCPU doing other
 stuff than kernbench) may be already running on PCPUs that are on
 different cores than the guest's one (i.e., the guest VCPUs that wants
 to run kernbench), and that may force two guest's vCPUs to execute on
 two HTs some of the time (which of course is something that does not
 happen on baremetal!)."

I just re-run the benchmark with credit2, which has no SMT knowledge,
and the first run (the one that does not use HT)  ended up to be 37.54,
while the other two were pretty much the same of above (26.81 and
27.92).

This confirms, for me, that it's an SMT balancing issue that we're seen.

I'll try more runs, e.g. with number of VCPUs equal less than
nr_corse/2 and see what happens.

Again, thoughts?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.