[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues



On Mon, May 16, 2016 at 5:30 AM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> [Adding George, and avoiding trimming, for his benefit]
>
> On Sat, 2016-05-14 at 22:11 -0600, Tony S wrote:
>> Hi all,
>>
> Hi Tony,
>
>> When I was running latency-sensitive applications in VMs on Xen, I
>> found some bugs in the credit scheduler which will cause long tail
>> latency in I/O-intensive VMs.
>>
> Ok, first of all, thanks for looking into and reporting this.
>
> This is certainly something we need to think about... For now, just a
> couple of questions.

Hi Dario,

Thank you for your reply. :-)

>
>> (1) Problem description
>>
>> ------------Description------------
>> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
>> 3.18.21), Dom U(Linux 3.18.21).
>>
>> Environment setup:
>> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
>> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
>> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
>> simply a loop(denoted as CPU-VM). A client on another physical
>> machine
>> sent UDP requests to the I/O-VM.
>>
> So, just to be sure I've understood, you have 2 VMs, each with 1 vCPU,
> *both* pinned on the *same* pCPU, is this the case?
>

Yes.

>> Here are my tail latency results (micro-second):
>> Case   Avg      90%       99%        99.9%      99.99%
>> #1     108   &  114    &  128     &  129     &  130
>> #2     7811  &  13892  &  14874   &  15315   &  16383
>> #3     943   &  131    &  21755   &  26453   &  26553
>> #4     116   &  96     &  105     &  8217    &  13472
>> #5     116   &  117    &  129     &  131     &  132
>>
>> Bug 1, 2, and 3 will be discussed below.
>>
>> Case #1:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> idling (no processes running).
>>
>> Case #2:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0
>>
>> Case #3:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 fixed
>>
>> Case #4:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed
>>
>> Case #5:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed
>>
>> ---------------------------------------
>>
>>
>> (2) Problem analysis
>>
>> ------------Analysis----------------
>>
>> [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly
>> boosted due to CPU affinity.
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.
>> html
>>
>> We have already discussed this bug and a potential patch in the above
>> link. Although the discussed patch improved the tail latency, i.e.,
>> reducing the 90th percentile latency, the long tail latency is till
>> not bounded. Next, we discussed two new bugs that inflict latency
>> hike
>> at the very far end of the tail.
>>
> Right, and there is a fix upstream for this. It's not the patch you
> proposed in the thread linked above, but it should have had the same
> effect.
>
> Can you perhaps try something more recent thatn 4.5 (4.7-rc would be
> great) and confirm that the number still look similar?

I have tried the latest stable version Xen 4.6 today. Here is my results:

Case   Avg      90%       99%        99.9%      99.99%
#1     91     &  93         &  101       &  105       &  110
#2     22506 & 43011  &  231946  &  259501   &  265561
#3     917   &  95    &  25257   &  30048   &  30756
#4     110   &  95     &  102     &  12448    &  13255
#5     114   &  118   &  130     &  134     &  136

It seems that case#2 is much worse. The other cases are similar. My
raw latency data is pasted below.

For xen 4.7-rc, I have some installment issues on my machine,
therefore I have not tried that.


Raw data is as follows. Hope this could help you understand the issues
better. :-)
# case 1:
sockperf: ====> avg-lat= 91.688 (std-dev=2.950)
sockperf: ---> <MAX> observation =  110.647
sockperf: ---> percentile  99.99 =  110.647
sockperf: ---> percentile  99.90 =  105.242
sockperf: ---> percentile  99.50 =  101.531
sockperf: ---> percentile  99.00 =  101.066
sockperf: ---> percentile  95.00 =   97.016
sockperf: ---> percentile  90.00 =   93.294
sockperf: ---> percentile  75.00 =   92.157
sockperf: ---> percentile  50.00 =   91.437
sockperf: ---> percentile  25.00 =   90.786
sockperf: ---> <MIN> observation =   73.071


# case 2:
sockperf: ====> avg-lat=90019.931 (std-dev=136620.722)
sockperf: ---> <MAX> observation = 637712.152
sockperf: ---> percentile  99.99 = 637712.152
sockperf: ---> percentile  99.90 = 632901.547
sockperf: ---> percentile  99.50 = 615972.778
sockperf: ---> percentile  99.00 = 599698.318
sockperf: ---> percentile  95.00 = 428857.020
sockperf: ---> percentile  90.00 = 259316.760
sockperf: ---> percentile  75.00 = 114029.044
sockperf: ---> percentile  50.00 = 24629.429
sockperf: ---> percentile  25.00 = 10368.731
sockperf: ---> <MIN> observation =   81.046


#case 3:
sockperf: ====> avg-lat=917.394 (std-dev=3943.142)
sockperf: ---> <MAX> observation = 30756.289
sockperf: ---> percentile  99.99 = 30756.289
sockperf: ---> percentile  99.90 = 30048.372
sockperf: ---> percentile  99.50 = 25962.687
sockperf: ---> percentile  99.00 = 25257.746
sockperf: ---> percentile  95.00 = 5615.028
sockperf: ---> percentile  90.00 =   95.726
sockperf: ---> percentile  75.00 =   92.916
sockperf: ---> percentile  50.00 =   90.387
sockperf: ---> percentile  25.00 =   89.162
sockperf: ---> <MIN> observation =   67.762


#case 4:
sockperf: ====> avg-lat=110.159 (std-dev=555.153)
sockperf: ---> <MAX> observation = 13255.732
sockperf: ---> percentile  99.99 = 13255.732
sockperf: ---> percentile  99.90 = 12448.629
sockperf: ---> percentile  99.50 =  104.799
sockperf: ---> percentile  99.00 =  101.954
sockperf: ---> percentile  95.00 =   97.295
sockperf: ---> percentile  90.00 =   95.995
sockperf: ---> percentile  75.00 =   91.866
sockperf: ---> percentile  50.00 =   88.803
sockperf: ---> percentile  25.00 =   71.088
sockperf: ---> <MIN> observation =   65.826


#case 5:
sockperf: ====> avg-lat=114.984 (std-dev=3.782)
sockperf: ---> <MAX> observation =  136.748
sockperf: ---> percentile  99.99 =  136.748
sockperf: ---> percentile  99.90 =  134.192
sockperf: ---> percentile  99.50 =  131.467
sockperf: ---> percentile  99.00 =  130.200
sockperf: ---> percentile  95.00 =  121.575
sockperf: ---> percentile  90.00 =  118.518
sockperf: ---> percentile  75.00 =  116.343
sockperf: ---> percentile  50.00 =  114.356
sockperf: ---> percentile  25.00 =  112.479
sockperf: ---> <MIN> observation =   94.932


>
> About this below here, I'll read carefully and think about it. Thanks
> again.

Thank you, Dario.

For bug 2 and bug 3, although they will not influence the throughput,
latency, especially long tail latency is a big issue due to them.

>
>> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops
>> earning
>> credits and is removed from the active CPU list(in
>> __csched_vcpu_acct_stop_locked) if its credit is larger than the
>> upper
>> bound. Because the domain has only one VCPU and the VM will also be
>> removed from the active domain list.
>>
>> Every 10ms, csched_tick() --> csched_vcpu_acct() -->
>> __csched_vcpu_acct_start() will be executed and tries to put inactive
>> VCPUs back to the active list. However, __csched_vcpu_acct_start()
>> will only put the current VCPU back to the active list. If an
>> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
>> not be put back to the active VCPU list. If so, the I/O-bound VCPU
>> will likely miss the next credit refill in csched_acct() and can
>> easily enter the OVER state. As such, the I/O-bound VM will be unable
>> to be boosted and have very long latency. It takes at least one time
>> slice (e.g., 30ms) before the I/O VM is activated and starts to
>> receive credits.
>>
>> [Possible Solution] Try to activate any inactive VCPUs back to active
>> before next credit refill, instead of just the current VCPU.
>>
>>
>>
>> [Bug 3]: The BOOST priority might be changed to UNDER before the
>> boosted VCPU preempts the current running VCPU. If so, VCPU boosting
>> can not take effect.
>>
>> If a VCPU is in UNDER state and wakes up from sleep, it will be
>> boosted in csched_vcpu_wake(). However, the boosting is successful
>> only when __runq_tickle() preempts the current VCPU. It is possible
>> that csched_acct() can run between csched_vcpu_wake() and
>> __runq_tickle(), which will sometimes change the BOOST state back to
>> UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER
>> cannot preempt another UNDER VCPU. This also contributes to the far
>> end of the long tail latency.
>>
>> [Possible Solution]
>> 1. add a lock to prevent csched_acct() from interleaving with
>> csched_vcpu_wake();
>> 2. separate the BOOST state from UNDER and OVER states.
>> ---------------------------------------
>>
>>
>> Please confirm these bugs.
>> Thanks.
>>
>> --
>> Tony. S
>> Ph. D student of University of Colorado, Colorado Springs
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

-- 

Tony S.
Ph. D student of University of Colorado, Colorado Springs

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.