[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: new load balancer scheduler

Hi Alan.

Great questions. I've received similar enquiries from others on
the list so I'm taking the liberty of CCing xen-devel on the

On Mon, Jun 05, 2006 at 03:52:25PM -0400, Alan Greenspan wrote:
> 1.  Can you set the weight/cap for a domU in the domU configuration file or 
> only using xm after the domU is built?

You can only do this using the xm command after the domU is built.
There is no reason why we couldn't do this in the config file as
well though. Actualy we should, so these settings can be preserved
across guest reboots. Someone needs to add support for this.

> 2.  Clarification on SMP and caps - if a domU has say 4 VCPUs, but its cap 
> is set at 100, does that mean the all 4 VCPUs will share a single real CPU?

The cap doesn't say anything about which physical CPUs a particular
VCPU will run on. It just defines a total amount of CPU time that a
domain will be able to consume.

> 3.  Is there any way to set a cap so that an SMP domU running on an SMP 
> host can be capped but still run its VCPUs concurrently.   In other words, 
> for example, suppose I have a 4-way host and build a domU with 4 VCPUs.   I 
> might want to cap its use on any one real CPU to say %25 but when the domU 
> runs, still allow it to run on all 4 real CPUs simultaneously.    Not sure 
> the cap semantics allow for this.

No. Setting the cap is not a way to setup strict gang scheduling.
The new scheduler will run all VCPUs of a domain at the same speed,
meaning that they will each receive the same amount of CPU time.
However, there is no guarantee that they will all be run in a
strict gang.

> 4.  When running an SMP domU on a real SMP, does the scheduler attempt to 
> run all VCPUs concurrently (gang scheduling?).   Not sure what the 
> performance would be, particularly if the domU OS uses spin locks if VPUs 
> are not scheduled concurrently.

Again, no. While scheduling a guest's VCPUs concurrently is in
theory preferable for reasons such as spinlocks, experience shows
that normal SMP guest workloads do quite well when their VCPUs
are not strictly gang scheduled. In fact, enforcing strict gang
scheduling is totally inefficient.

A better alternative to gang scheduling is to "fold" a guest. For
example, instead of gang scheduling 4 VCPUs 25% of the time on 4
physical CPUs, you'd be better off running the guest 100% of the
time on 1 physical CPU. This way, you avoid cache trashing and
context switch overheads. To make this work well though, you need
to avoid potential spinlock and other concurrency costs. This can
be done by offline 3 VCPUs of your guest using the CPU hotplug

There are very few cases where strict gang scheduling is necessary.
The only case I've seen is in real time applications where you need
to frequently poll the real world and quickly compute an adjustement
(like a shuttle docking with a space station for example). Here,
there is a latency requirement between the poll and adjustement
operations. Computing the adjustement in parallel on multiple CPUs
increases its precision.

Argueably, such cases are rare and are better solved by dedicating
actual physical resources to applications. If we're going to tackle
these types of problems in Xen, we should design a reservation based
system that works side by side with the work-conserving CPU scheduler.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.