Xen project Mailing List

Re: [Xen-devel] [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues

To: George Dunlap <dunlapg@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Date: Tue, 17 May 2016 10:11:55 -0600

Cc: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Delivery-date: Tue, 17 May 2016 16:12:43 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, May 17, 2016 at 3:27 AM, George Dunlap <dunlapg@xxxxxxxxx> wrote: > On Sun, May 15, 2016 at 5:11 AM, Tony S <suokunstar@xxxxxxxxx> wrote: >> Hi all, >> >> When I was running latency-sensitive applications in VMs on Xen, I >> found some bugs in the credit scheduler which will cause long tail >> latency in I/O-intensive VMs. >> >> >> (1) Problem description >> >> ------------Description------------ >> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux >> 3.18.21), Dom U(Linux 3.18.21). >> >> Environment setup: >> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one >> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server >> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or >> simply a loop(denoted as CPU-VM). A client on another physical machine >> sent UDP requests to the I/O-VM. >> >> Here are my tail latency results (micro-second): >> Case Avg 90% 99% 99.9% 99.99% >> #1 108 & 114 & 128 & 129 & 130 >> #2 7811 & 13892 & 14874 & 15315 & 16383 >> #3 943 & 131 & 21755 & 26453 & 26553 >> #4 116 & 96 & 105 & 8217 & 13472 >> #5 116 & 117 & 129 & 131 & 132 >> >> Bug 1, 2, and 3 will be discussed below. >> >> Case #1: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> idling (no processes running). >> >> Case #2: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 >> >> Case #3: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 fixed >> >> Case #4: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed >> >> Case #5: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed >> >> --------------------------------------- >> >> >> (2) Problem analysis > > Hey Tony, > > Thanks for looking at this. These issues in the credit1 algorithm are > essentially exactly the reason that I started work on the credit2 > scheduler several years ago. We meant credit2 to have replaced > credit1 by now, but we ran out of time to test it properly; we're in > the process of doing that right now, and are hoping it will be the > default scheduler for the 4.8 release. > > So if I could make two suggestions that would help your effort be more > helpful to us: > > 1. Use cpupools for testing rather than pinning. A lot of the > algorithms are designed with the assumption that they have all the > cpus to run on, and the credit allocation / priority algorithms fail > to work properly when they are only pinned. Cpupools was specifically > designed to allow the scheduler algorithms to work as designed with a > smaller number of cpus than the system had. > > 2. Test credit2. :-) > Hi George, Thank you for reply. I will try cpupools and credit2 later. :-) > One comment about your analysis here... > >> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning >> credits and is removed from the active CPU list(in >> __csched_vcpu_acct_stop_locked) if its credit is larger than the upper >> bound. Because the domain has only one VCPU and the VM will also be >> removed from the active domain list. >> >> Every 10ms, csched_tick() --> csched_vcpu_acct() --> >> __csched_vcpu_acct_start() will be executed and tries to put inactive >> VCPUs back to the active list. However, __csched_vcpu_acct_start() >> will only put the current VCPU back to the active list. If an >> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will >> not be put back to the active VCPU list. If so, the I/O-bound VCPU >> will likely miss the next credit refill in csched_acct() and can >> easily enter the OVER state. As such, the I/O-bound VM will be unable >> to be boosted and have very long latency. It takes at least one time >> slice (e.g., 30ms) before the I/O VM is activated and starts to >> receive credits. >> >> [Possible Solution] Try to activate any inactive VCPUs back to active >> before next credit refill, instead of just the current VCPU. > > When we stop accounting, we divide the credits in half, so that when > it starts out, it should have a reasonable amount of credit (15ms > worth). Is this not taking effect for some reason? > Actually, for bug 2, dividing the credits in half to have a reasonable credit is not the issue. The problem here is that the VCPU will be removed from active VCPU list(in __csched_vcpu_acct_stop_locked) and will not be put back to active list in time sometimes(as I explained in the first thread). If the VCPU is not active, next time the csched_acct will not allocate new credits to this VCPU. If many rounds happened, the credit of this VCPU will be a small negative number(e.g., -1000) and won't be scheduled. The I/O-intensive applications on it, especially latency-intensive workloads, will suffer long tail latency issue. > -George -- Tony _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.