[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] New CPU scheduler w/ SMP load balancer



Hi Anthony.

Thanks for your feedback. I'll take a look at your comments
regarding the Xend python code in the patch this week end.

On Fri, May 26, 2006 at 10:42:39AM -0500, Anthony Liguori wrote:
> Can you provide some more details on any results you may have seen with 
> the new scheduler?  How does it affect common benchmarks?  How does the 
> "load balancer" scale?  How much penalty do you pay (if any at all) on UP?

It is not simple to define a set of performance benchmarks for
a VCPU scheduler. On an SMP host, the credit scheduler is a lot
better at enforcing fairness across multiple guest, some SMP
and some UP. Certainly, the VCPU scheduler has an effect on I/O
benchmarks because of the interaction between domUs and dom0.

I found that on a uni-processor, running ttcp in a domU yielded
almost twice the network bandwidth with the credit scheduler
compared to with SEDF. This probably has less to do with scheduling
algorithms than with implementation problems though.

For SMP guests, the credit scheduler enforces that all VCPUs
make equal progress. This solves a number of serious performance
problems when you are time slicing some of your physical CPUs
between multiple SMP guests.

In terms of consolidating multiple guests on one SMP host, we
are now playing in a different ballpark with the credit scheduler:
When a CPU goes idle, it immediately picks up a runnable VCPU
waiting on the runqueue on another CPU. With SEDF and BVT, you
have to manually place all the VCPUs in the system and there are
no dynamic adjustements when VCPUs go to sleep waiting for I/O.
The credit scheduler is work conserving in that it will make use
of any CPU cycles when there is runnable work. It does this as
soon as a CPU runs out of work. This is in contrast with other
load balancing algorithms that work in the background and move
things around on some type of clock tick. Being work conserving
on SMP hosts is a huge improvement over the previous scheduler
implementations.

In terms of scaling, I have taken profiles on an 8-way system
and found lock contention to be reasonable. We'll need to do
some performance work and perhaps pad some cachelines or change
a few things to run on very large NUMA type systems but by design,
the credit scheduler is designed to scale to very large systems.

The common code path (do_schedule) is designed to be extremely
fast on both UP and MP systems. Using the scientific method of
code inspection :-), these code paths are a lot shorter and faster
than the SEDF ones. The accounting work in the credit scheduler is
done every 30 milliseconds outside the common path and its
complexity is linear with the number of running VCPUs in the
system. Making accounting work overhead independant of the
number of scheduling operations is good on I/O workloads where
lots of context switches occur.

> Better yet, if you have a paper you could share, that would be even 
> better :-)  If you cannot share because of conference restrictions, it 
> would be nice if you could a condensed version (similar to what the L4ka 
> group did for their afterburning work).

Writing a paper is something I'd like to do at some point once
we've had more experience in the field.

> Based on your description though, the new scheduler looks very promising!

I am eager to hear people's experiences with the new scheduler,
especially on SMP hosts.


Cheers,
Emmanuel.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.