[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding



On Wednesday 08 June 2005 13:40, Bryan S Rosenburg wrote:
> "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx> wrote on 06/08/2005 02:25:56 
PM:
> > > The key point is that with
> > > kernel-level preemption notification, VCPUs are always in
> > > kernel mode when suspended, never in user mode.  Application
> > > state is always saved in Linux, not in Xen, and is available
> > > to be resumed on another VCPU if Linux so chooses.
> >
> > In principle, but...
> >
> > Do you believe this is going to interact well with Linux's work
> > stealing CPU migration? I haven't looked closely at the current
> > code, but from Linux's scheduler's POV the de-scheduled (yielded)
> > CPU looks like a perfectly healthy CPU, so there's no particular
> > reason that another CPU would steal work from it (without hacking
> > the algorithm, which I suppose we could do). Also, do you have to
> > do something special in your yield routine to ensure that no real
> > process is currently running on the yielded processor so that all
> > processes on the run queue are available for stealing?
> >
> > Ian
>
> In our original posting, we proposed that the Linux interrupt handler
> for preemption notifications would create (or unblock) a
> high-priority kernel thread which would then yield back to the
> hypervisor.  To Linux on other CPUs, the de-scheduled CPU would
> appear to be busy running the high-priority thread, and all real work
> that that CPU had been doing would be eligible for stealing.

IMO, I don't think this alone is enough to encourage task migration.  
The primary motivator to steal is a 25% or more load imbalance, and one 
extra fake kernel thread will probably not be enough to trigger this.

To solve this and other issues, I believe we need an extra modifier to 
the Linux kernel cpus' load value, which Xen could modify to hint the 
kernel what cpus' relative processing power is.  The Linux kernel 
scheduler's per cpu load values would be something like (max_cpu_power 
/ cpu_power * nr_running).  Xen could update cpu_power for a number of 
situations, a "long" preemption, a much faster alternative to a vcpu 
hot-unplug (don't unplug, just set cpu_power to 0), and to normalize 
load values for vcpus which have different time-slice lengths on the 
physical cpus.  

I would hope something like this could also be used without Xen on Linux 
so it has wider appeal.  One thing that comes to mind is normalizing 
cpus' load when some cpus may be "speed stepped" down for power 
management or heat issues.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.