[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest

On Mon, 2015-07-27 at 12:11 +0100, George Dunlap wrote:

> 1. Userspace applications are in the habit of reading CPUID to determine
> the topology of the system they're running on
I'd add this item here:

1b. Linux kernel uses CPUID to configure some bits of its scheduler. The
    result of that, will affect *all* the in-guest application that does
    not use CPUID for scheduling/performance purposes

Yes, I saw you mention this afterwards, but I think it really should be
here, in the analysis part.

> 2. Many use the topology information to help themselves make better
> scheduling decisions.  Because a vcpu is not typically pinned to a
> specific pcpu, we may need to lie here slightly (e.g., not mention
> threads) to get the optimal behavior overall.
> 3. Others use the topology information to implement licensing
> restrictions.  Because threads are treated differently to cores, we want
> to tell the truth here (i.e., make sure we mention that some of these
> are threads) to get the optimal behavior overall.
Define truth. I mean, is 'telling the truth' always good for this case?
E.g., in a 4 vcpus guest, on a 4 socket box, without any pinning, it may
be that when app X samples CPUID to check the license, each vcpu is
running on a different socket, so the truth means we need a licence for
4 sockets... I don't think this is ideal, is it?

> Numbers #2 and #3 lead to contradictory courses of action; we cannot
> optimize for both at the same time.
...that's certainly true, IMO.

> I think at some level we need to just try to accommodate both -- if the
> user doesn't have licensing issues, or prefers performance over
> licensing, then present a unified topology in PVH / HVM using CPUID,
> ACPI, &c.  I think this should be the default.
> If the user has licensing issues, and doesn't mind having wonky or
> unreliable topology to its guests, then let the raw CPUID through.  But
> it would, in this case, be good to try to give the guest OS scheduler a
> hint that it shouldn't really bother trying to read the topology or do
> placement as a result, as any decisions will be unreliable.
This last sentence is basically Juergen proposal, AFAICT, and I agree
that it would be good in case #3. But, thinking more about it, would it
harm in case #2? I don't think it would.

After all, it'd boil down to make peace with the fact that something
like CPUID, in a (not only PV!) VM, is just not reliable enough to use
it for building in-kernel scheduling related data structure, like
Linux's scheduling domains. It is unreliable because its content may
conflict with vNUMA, but, really, even with no vNUMA, it is unreliable
because it depends on whether the VM's vcpus are pinned or not, and if
not, it depends on where they actually run, which is pure randomness,
from the guest point of view.

So, I'm really starting to think that a patch stopping the Linux kernel
relying on CPUID, whether original or mangled, and for all kind of
guests, would really be a nice one to have!


<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.