[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] "cpus" config parameter broken?

Sorry to belabo(u)r the point but I beg to differ: The current
hypervisor interface is a strange mixture of flexibility and
restriction (and policy and mechanism):  Some mask parameters
are left alone by vcpu_set_affinity, others are rejected entirely,
and still others are silently modified.  The advantage to the
existing interface is of course that it preserves the downward
interface to the schedulers, eg. schedulers can assume that
any bit set represents a schedulable processor.

So if the toolstack knows what it is doing, why does
vcpu_set_affinity even look at the mask? IMHO either:

1) the policy belongs in the tools, in which case the and'ing
   of the mask should only be done by the scheduler whenever a
   vcpu is scheduled (thus allowing maximal flexibility
   for future highly dynamic hot-plug but ensuring a vcpu
   never gets scheduled on an offline or non-existent pcpu),
2) the policy belongs in the hypervisor, in which case any
   attempt by the tools to allow scheduling (e.g. set affinity)
   on an offline or non-existent processor should be rejected
   (in which case the toolset is immediately notified that
   its understanding of the current online set is faulty).

Though it could be argued academically that "policy" doesn't
belong in the hypervisor, rejecting an attempt by the tools
to use a non-available processor isn't much different than
rejecting an SSE3 instruction on a non-SSE3 processor.
(In other words, it's really processor enforcement mechanism.)
So I like #2.  #1 would be OK too.  I just don't like the
current muddle which has already led to misunderstandings
and inconsistent implementations in the current toolchain.


> -----Original Message-----
> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
> Sent: Thursday, January 10, 2008 4:53 PM
> To: dan.magenheimer@xxxxxxxxxx; Ian Pratt; 
> xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] "cpus" config parameter broken?
> The current hypervisor interface has the advantage of 
> flexibility. You can
> easily enforce various policies (including strict checking, or modulo
> arithmetic) in the toolstack on top of the current interface. 
> But you can't
> (easily) implement the current hypervisor policy in the 
> toolstack on top of
> strict checking or modulo arithmetic (if one of those policies becomes
> hardcoded into the hypervisor).
> The current interface assumes the lowest levels of the 
> toolstack know what
> they are doing, and presents a policy that is as permissive 
> as possible.
>  -- Keir
> On 10/1/08 23:46, "Dan Magenheimer" 
> <dan.magenheimer@xxxxxxxxxx> wrote:
> >> You mean CPUs beyond NR_CPUS? All the cpumask iterators are
> >> careful not to
> >> return values beyond NR_CPUS, regardless of what stray bits
> >> lie beyond that
> >> range in the longword bitmap.
> >
> > I see... you are allowing for any future box to grow to NR_CPUS
> > and I am assuming that, even with future hot-add processors,
> > Xen will be told by the box the maximum number of processors
> > that will ever be online (call this max_pcpu), and that max_pcpu
> > is probably less than NR_CPUS.  So for these NR_CPUS-max_pcpu
> > processors that are "non-existent" (and especially for the
> > foreseeable future on the vast majority of machines for which
> > max_pcpu=npcpu=constant and ncpu << NR_CPUS), trying to set
> > bits for non-existent processors should not be silently ignored
> > and discarded, but should either be entirely
> > disallowed or, at least, should be retained and ignored.
> > I would propose "disallowed" for n > max_pcpu and retained
> > and ignored for online_pcpu < n < max_pcpu.
> >
> > A related aside, for either model for hot-add (yours or mine),
> > the current modulo mechanism in xm_vcpu_pin is not scaleable
> > and imho should be removed now as well before anybody comes to
> > depend on it.
> >
> > And lastly, this hot-add discussion reinforces in my mind the
> > difference between affinity and restriction (and pinning) which
> > are all muddled in the current hypervisor and tools.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.