Xen project Mailing List

[Xen-devel] [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues

Date: Sat, 14 May 2016 22:11:55 -0600

Delivery-date: Sun, 15 May 2016 05:58:38 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi all, When I was running latency-sensitive applications in VMs on Xen, I found some bugs in the credit scheduler which will cause long tail latency in I/O-intensive VMs. (1) Problem description ------------Description------------ My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux 3.18.21), Dom U(Linux 3.18.21). Environment setup: We created two 1-vCPU, 4GB-memory VMs and pinned them onto one physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or simply a loop(denoted as CPU-VM). A client on another physical machine sent UDP requests to the I/O-VM. Here are my tail latency results (micro-second): Case Avg 90% 99% 99.9% 99.99% #1 108 & 114 & 128 & 129 & 130 #2 7811 & 13892 & 14874 & 15315 & 16383 #3 943 & 131 & 21755 & 26453 & 26553 #4 116 & 96 & 105 & 8217 & 13472 #5 116 & 117 & 129 & 131 & 132 Bug 1, 2, and 3 will be discussed below. Case #1: I/O-VM was processing Sockperf requests from clients; CPU-VM was idling (no processes running). Case #2: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 Case #3: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 with bug 1 fixed Case #4: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed Case #5: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed --------------------------------------- (2) Problem analysis ------------Analysis---------------- [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly boosted due to CPU affinity. http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.html We have already discussed this bug and a potential patch in the above link. Although the discussed patch improved the tail latency, i.e., reducing the 90th percentile latency, the long tail latency is till not bounded. Next, we discussed two new bugs that inflict latency hike at the very far end of the tail. [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning credits and is removed from the active CPU list(in __csched_vcpu_acct_stop_locked) if its credit is larger than the upper bound. Because the domain has only one VCPU and the VM will also be removed from the active domain list. Every 10ms, csched_tick() --> csched_vcpu_acct() --> __csched_vcpu_acct_start() will be executed and tries to put inactive VCPUs back to the active list. However, __csched_vcpu_acct_start() will only put the current VCPU back to the active list. If an I/O-bound VCPU is not the current VCPU at the csched_tick(), it will not be put back to the active VCPU list. If so, the I/O-bound VCPU will likely miss the next credit refill in csched_acct() and can easily enter the OVER state. As such, the I/O-bound VM will be unable to be boosted and have very long latency. It takes at least one time slice (e.g., 30ms) before the I/O VM is activated and starts to receive credits. [Possible Solution] Try to activate any inactive VCPUs back to active before next credit refill, instead of just the current VCPU. [Bug 3]: The BOOST priority might be changed to UNDER before the boosted VCPU preempts the current running VCPU. If so, VCPU boosting can not take effect. If a VCPU is in UNDER state and wakes up from sleep, it will be boosted in csched_vcpu_wake(). However, the boosting is successful only when __runq_tickle() preempts the current VCPU. It is possible that csched_acct() can run between csched_vcpu_wake() and __runq_tickle(), which will sometimes change the BOOST state back to UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER cannot preempt another UNDER VCPU. This also contributes to the far end of the long tail latency. [Possible Solution] 1. add a lock to prevent csched_acct() from interleaving with csched_vcpu_wake(); 2. separate the BOOST state from UNDER and OVER states. --------------------------------------- Please confirm these bugs. Thanks. -- Tony. S Ph. D student of University of Colorado, Colorado Springs _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.