[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sketch of an idea for handling the "mixed workload" problem



The basic credit2 algorithm goes something like this:

1. All vcpus start with the same number of credits; about 10ms worth
if everyone has the same weight

2. vcpus burn credits as they consume cpu, based on the relative
weights: higher weights burn slower, lower weights burn faster

3. At any given point in time, the runnable vcpu with the highest
credit is allowed to run

4. When the "next runnable vcpu" on a runqueue is negative, credit is
reset: everyone gets another 10ms, and can carry over at most 2ms of
credit over the reset.

Generally speaking, vcpus that use less than their quota and have lots
of interrupts are scheduled immediately, since when they wake up they
always have more credit than the vcpus who are burning through their
slices.

But what about a situation as described recently on Matrix, where a VM
uses a non-negligible amount of cpu doing un-accelerated encryption
and decryption, which can be delayed by a few MS, as well as handling
audio events?  How can we make sure that:

1. We can run whenever interrupts happen
2. We get no more than our fair share of the cpu?

The counter-intuitive key here is that in order to achieve the above,
you need to *deschedule or preempt early*, so that when the interrupt
comes, you have spare credit to run the interrupt handler.  How do we
manage that?

The idea I'm working out comes from a phrase I used in the Matrix
discussion, about a vcpu that "foolishly burned all its credits".
Naturally the thing you want to do to have credits available is to
save them up.

So the idea would be this.  Each vcpu would have a "boost credit
ratio" and a "default boost interval"; there would be sensible
defaults based on typical workloads, but these could be tweaked for
individual VMs.

When credit is assigned, all VMs would get the same amount of credit,
but divided into two "buckets", according to the boost credit ratio.

Under certain conditions, a vcpu would be considered "boosted"; this
state would last either until the default boost interval, or until
some other event (such as a de-boost yield).

The queue would be sorted thus:

* Boosted vcpus, by boost credit available
* Non-boosted vcpus, by non-boost credit available

Getting more boost credit means having lower priority when not
boosted; and burning through your boost credit means not being
scheduled when you need to be.

Other ways we could consider putting a vcpu into a boosted state (some
discussed on Matrix or emails linked from Matrix):
* Xen is about to preempt, but finds that the vcpu interrupts are
blocked (this sort of overlaps with the "when we deliver an interrupt"
one)
* Xen is about to preempt, but finds that the (currently out-of-tree)
"dont_desched" bit has been set in the shared memory area

Other ways to consider de-boosting:
* There's a way to trigger a VMEXIT when interrupts have been
re-enabled; setting this up when the VM is in the boost state

Getting the defaults right might take some thinking.  If you set the
default "boost credit ratio" to 25% and the "default boost interval"
to 500ms, then you'd basically have five "boosts" per scheduling
window.  The window depends on how active other vcpus are, but if it's
longer than 20ms your system is too overloaded.

Thoughts?  Demi, what kinds of interrupt counts are you getting for your VM?

 -George



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.