[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues



[Adding George, and avoiding trimming, for his benefit]

On Sat, 2016-05-14 at 22:11 -0600, Tony S wrote:
> Hi all,
> 
Hi Tony,

> When I was running latency-sensitive applications in VMs on Xen, I
> found some bugs in the credit scheduler which will cause long tail
> latency in I/O-intensive VMs.
> 
Ok, first of all, thanks for looking into and reporting this.

This is certainly something we need to think about... For now, just a
couple of questions.

> (1) Problem description
> 
> ------------Description------------
> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
> 3.18.21), Dom U(Linux 3.18.21).
> 
> Environment setup:
> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
> simply a loop(denoted as CPU-VM). A client on another physical
> machine
> sent UDP requests to the I/O-VM.
> 
So, just to be sure I've understood, you have 2 VMs, each with 1 vCPU,
*both* pinned on the *same* pCPU, is this the case?

> Here are my tail latency results (micro-second):
> Case   Avg      90%       99%        99.9%      99.99%
> #1     108   &  114    &  128     &  129     &  130
> #2     7811  &  13892  &  14874   &  15315   &  16383
> #3     943   &  131    &  21755   &  26453   &  26553
> #4     116   &  96     &  105     &  8217    &  13472
> #5     116   &  117    &  129     &  131     &  132
> 
> Bug 1, 2, and 3 will be discussed below.
> 
> Case #1:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> idling (no processes running).
> 
> Case #2:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0
> 
> Case #3:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 fixed
> 
> Case #4:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed
> 
> Case #5:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed
> 
> ---------------------------------------
> 
> 
> (2) Problem analysis
> 
> ------------Analysis----------------
> 
> [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly
> boosted due to CPU affinity.
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.
> html
> 
> We have already discussed this bug and a potential patch in the above
> link. Although the discussed patch improved the tail latency, i.e.,
> reducing the 90th percentile latency, the long tail latency is till
> not bounded. Next, we discussed two new bugs that inflict latency
> hike
> at the very far end of the tail.
> 
Right, and there is a fix upstream for this. It's not the patch you
proposed in the thread linked above, but it should have had the same
effect.

Can you perhaps try something more recent thatn 4.5 (4.7-rc would be
great) and confirm that the number still look similar?

About this below here, I'll read carefully and think about it. Thanks
again.

> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops
> earning
> credits and is removed from the active CPU list(in
> __csched_vcpu_acct_stop_locked) if its credit is larger than the
> upper
> bound. Because the domain has only one VCPU and the VM will also be
> removed from the active domain list.
> 
> Every 10ms, csched_tick() --> csched_vcpu_acct() -->
> __csched_vcpu_acct_start() will be executed and tries to put inactive
> VCPUs back to the active list. However, __csched_vcpu_acct_start()
> will only put the current VCPU back to the active list. If an
> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
> not be put back to the active VCPU list. If so, the I/O-bound VCPU
> will likely miss the next credit refill in csched_acct() and can
> easily enter the OVER state. As such, the I/O-bound VM will be unable
> to be boosted and have very long latency. It takes at least one time
> slice (e.g., 30ms) before the I/O VM is activated and starts to
> receive credits.
> 
> [Possible Solution] Try to activate any inactive VCPUs back to active
> before next credit refill, instead of just the current VCPU.
> 
> 
> 
> [Bug 3]: The BOOST priority might be changed to UNDER before the
> boosted VCPU preempts the current running VCPU. If so, VCPU boosting
> can not take effect.
> 
> If a VCPU is in UNDER state and wakes up from sleep, it will be
> boosted in csched_vcpu_wake(). However, the boosting is successful
> only when __runq_tickle() preempts the current VCPU. It is possible
> that csched_acct() can run between csched_vcpu_wake() and
> __runq_tickle(), which will sometimes change the BOOST state back to
> UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER
> cannot preempt another UNDER VCPU. This also contributes to the far
> end of the long tail latency.
> 
> [Possible Solution]
> 1. add a lock to prevent csched_acct() from interleaving with
> csched_vcpu_wake();
> 2. separate the BOOST state from UNDER and OVER states.
> ---------------------------------------
> 
> 
> Please confirm these bugs.
> Thanks.
> 
> --
> Tony. S
> Ph. D student of University of Colorado, Colorado Springs
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.