[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest



On 07/28/2015 06:29 AM, Juergen Gross wrote:
On 07/27/2015 04:09 PM, Dario Faggioli wrote:
On Fri, 2015-07-24 at 18:10 +0200, Juergen Gross wrote:
On 07/24/2015 05:58 PM, Dario Faggioli wrote:

So, just to check if I'm understanding is correct: you'd like to add an
abstraction layer, in Linux, like in generic (or, perhaps, scheduling)
code, to hide the direct interaction with CPUID.
Such layer, on baremetal, would just read CPUID while, on PV-ops, it'd
check with Xen/match vNUMA/whatever... Is this that you are saying?

Sort of, yes.

I just wouldn't add it, as it is already existing (more or less). It
can deal right now with AMD and Intel, we would "just" have to add Xen.

So, having gone through the rest of the thread (so far), and having
given a fair amount o thinking to this, I really think that something
like this would be a good thing to have in Linux.

Of course, it's not that my opinion on where should be in Linux counts
that much! :-D   Nevertheless, I wanted to make it clear that, while
skeptic at the beginning, I now think this is (part of) the way to go,
as I said and explained in my reply to George.

I think it's time to obtain some real numbers.

I'll make some performance tests on a big machine (4 sockets, 60 cores,
120 threads) regarding topology information:

- bare metal
- "random" topology (like today)
- "simple" topology (all vcpus regarded as equal)
- "real" topology with all vcpus pinned

This should show:

- how intrusive would the topology patch(es) be?
- what is the performance impact of a "wrong" scheduling data base

On the above box I used a pvops kernel 4.2-rc4 plus a rather small patch
(see attachment). I did 5 kernel builds in each environment:

make clean
time make -j 120

The first result of the 5 runs was always omitted as it would have to
build up buffer caches etc. The Xen cases were all done in dom0, pinning
of vcpus in the last scenario was done via dom0_vcpus_pin boot parameter
of the hypervisor.

Here are the results (everything in seconds):

                    elapsed   user   system
bare metal:            100    5770      805
"random" topology:     283    6740    20700
"simple" topology:     290    6740    22200
"real" topology:       185    7800     8040

As expected bare metal is the best. Next is "real" topology with pinned
vcpus (expected again - but system time already factor of 10 up!).
What I didn't expect is: "random" is better than "simple" topology. I
could test some other topologies (e.g. everything on one socket, or even
on one core), but I'm not sure this makes sense. I didn't check the
exact topology result of the "random" case, maybe I'll do that tomorrow
with another measurement.

BTW: the topology hack is working, as each cpu is shown to have a
sibling count of 1 in /proc/cpuinfo.


Juergen

Attachment: topo.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.