[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On Fri, Jul 24, 2015 at 05:58:29PM +0200, Dario Faggioli wrote: > On Fri, 2015-07-24 at 17:24 +0200, Juergen Gross wrote: > > On 07/24/2015 05:14 PM, Juergen Gross wrote: > > > On 07/24/2015 04:44 PM, Dario Faggioli wrote: > > > >> In fact, I think that it is the topology, i.e., what comes from MSRs, > > >> that needs to adapt, and follow vNUMA, as much as possible. Do we agree > > >> on this? > > > > > > I think we have to be very careful here. I see two possible scenarios: > > > > > > 1) The vcpus are not pinned 1:1 on physical cpus. The hypervisor will > > > try to schedule the vcpus according to their numa affinity. So they > > > can change pcpus at any time in case of very busy guests. I don't > > > think the linux kernel should treat the cpus differently in this > > > case as it will be in vane regarding the Xen scheduler's activity. > > > So we should use the "null" topology in this case. > > > > Sorry, the topology should reflect the vcpu<->numa-node relations, of > > course, but nothing else (so flat topolgy in each numa node). > > > Yeah, I was replying to this point saying something like this right > now... Luckily, I've seen this email! :-P > > With this semantic, I fully agree with this. > > > > 2) The vcpus of the guest are all pinned 1:1 to physical cpus. The Xen > > > scheduler can't move vcpus between pcpus, so the linux kernel should > > > see the real topology of the used pcpus in order to optimize for this > > > picture. > > > > > > Mmm... I did think about this too, but I'm not sure. I see the value of > this of course, and the reason why it makes sense. However, pinning can > change on-line, via `xl vcpu-pin' and stuff. Also migration could make > things less certain, I think. What happens if we build on top of the > initial pinning, and then things change? > > To be fair, there is stuff building on top of the initial pinning > already, e.g., from which physical NUMA node we allocate the memory > relies depends exactly on that. That being said, I'm not sure I'm > comfortable with adding more of this... > > Perhaps introduce an 'immutable_pinning' flag, which will prevent > affinity to be changed, and then bind the topology to pinning only if > that one is set? > > > >> Maybe, there is room for "fixing" this at this level, hooking up inside > > >> the scheduler code... but I'm shooting in the dark, without having check > > >> whether and how this could be really feasible, should I? > > > > > > Uuh, I don't think a change of the scheduler on behalf of Xen is really > > > appreciated. :-) > > > > I'm sure it would (have been! :-)) a true and giant nightmare!! :-D > > > >> One thing I don't like about this approach is that it would potentially > > >> solve vNUMA and other scheduling anomalies, but... > > >> > > >>> cpuid instruction is available for user mode as well. > > >>> > > >> ...it would not do any good for other subsystems, and user level code > > >> and apps. > > > > > > Indeed. I think the optimal solution would be two-fold: give the > > > scheduler the information it is needing to react correctly via a > > > kernel patch not relying on cpuid values and fiddle with the cpuid > > > values from xen tools according to any needs of other subsystems and/or > > > user code (e.g. licensing). > > > So, just to check if I'm understanding is correct: you'd like to add an > abstraction layer, in Linux, like in generic (or, perhaps, scheduling) > code, to hide the direct interaction with CPUID. > Such layer, on baremetal, would just read CPUID while, on PV-ops, it'd > check with Xen/match vNUMA/whatever... Is this that you are saying? > > If yes, I think I like it... I don't think this is workable. For example there are applications which use 'cpuid' and figure out the core/thread and use it for its own scheduling purposes. > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |