Xen project Mailing List

Sketch of an idea for handling the "mixed workload" problem

To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxx>

Date: Fri, 29 Sep 2023 17:42:16 +0100

Cc: Juergen Gross <jgross@xxxxxxxx>, Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 29 Sep 2023 16:42:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

The basic credit2 algorithm goes something like this: 1. All vcpus start with the same number of credits; about 10ms worth if everyone has the same weight 2. vcpus burn credits as they consume cpu, based on the relative weights: higher weights burn slower, lower weights burn faster 3. At any given point in time, the runnable vcpu with the highest credit is allowed to run 4. When the "next runnable vcpu" on a runqueue is negative, credit is reset: everyone gets another 10ms, and can carry over at most 2ms of credit over the reset. Generally speaking, vcpus that use less than their quota and have lots of interrupts are scheduled immediately, since when they wake up they always have more credit than the vcpus who are burning through their slices. But what about a situation as described recently on Matrix, where a VM uses a non-negligible amount of cpu doing un-accelerated encryption and decryption, which can be delayed by a few MS, as well as handling audio events? How can we make sure that: 1. We can run whenever interrupts happen 2. We get no more than our fair share of the cpu? The counter-intuitive key here is that in order to achieve the above, you need to *deschedule or preempt early*, so that when the interrupt comes, you have spare credit to run the interrupt handler. How do we manage that? The idea I'm working out comes from a phrase I used in the Matrix discussion, about a vcpu that "foolishly burned all its credits". Naturally the thing you want to do to have credits available is to save them up. So the idea would be this. Each vcpu would have a "boost credit ratio" and a "default boost interval"; there would be sensible defaults based on typical workloads, but these could be tweaked for individual VMs. When credit is assigned, all VMs would get the same amount of credit, but divided into two "buckets", according to the boost credit ratio. Under certain conditions, a vcpu would be considered "boosted"; this state would last either until the default boost interval, or until some other event (such as a de-boost yield). The queue would be sorted thus: * Boosted vcpus, by boost credit available * Non-boosted vcpus, by non-boost credit available Getting more boost credit means having lower priority when not boosted; and burning through your boost credit means not being scheduled when you need to be. Other ways we could consider putting a vcpu into a boosted state (some discussed on Matrix or emails linked from Matrix): * Xen is about to preempt, but finds that the vcpu interrupts are blocked (this sort of overlaps with the "when we deliver an interrupt" one) * Xen is about to preempt, but finds that the (currently out-of-tree) "dont_desched" bit has been set in the shared memory area Other ways to consider de-boosting: * There's a way to trigger a VMEXIT when interrupts have been re-enabled; setting this up when the VM is in the boost state Getting the defaults right might take some thinking. If you set the default "boost credit ratio" to 25% and the "default boost interval" to 500ms, then you'd basically have five "boosts" per scheduling window. The window depends on how active other vcpus are, but if it's longer than 20ms your system is too overloaded. Thoughts? Demi, what kinds of interrupt counts are you getting for your VM? -George

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.