[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] xen: credit2: credit2 can’t reach the throughput as expected

To: "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
From: 郑川 <jason.zhengchuan@xxxxxxxxxxx>
Date: Tue, 12 Feb 2019 13:36:30 +0000
Accept-language: zh-CN, en-US
Cc: "jgross@xxxxxxxx" <jgross@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "frank.yangjie@xxxxxxxxx" <frank.yangjie@xxxxxxxxx>, "wei.liu2@xxxxxxxxxx" <wei.liu2@xxxxxxxxxx>, "dfaggioli@xxxxxxxx" <dfaggioli@xxxxxxxx>
Delivery-date: Tue, 12 Feb 2019 13:53:48 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Thread-index: AQHUwta79p843u8u4EWyXDkEDPFJxw==
Thread-topic: credit2: credit2 can’t reach the throughput as expected

Hi, George,

I found Credit2 can’t reach the throughput as expected under my test workload, 
compared to Credit and CFS. It is easy to reproduce, and I think the problem is 
really exist.
It really took me a long time to find out why due to my lack of knowledge, and 
I cannot find a good way to solve it.
Please do help to take a look at it. Thx.

=========
Problem :

***************
[How to reproduce]
***************
I use openSUSE-Tumbleweed with xen-4.11 version.
Here is the test workload like:
I have guest_1 with 4 vCPU and guest_2 with 8 vCPU running on 4 pCPU, that is, 
the relation of pCPU:vCPU is 1:3.
Then I add pressure with 20% CPU usage for each vCPU, which results in total 
240% pCPU usage.
The 20% pressure model is that, I start one process on each vCPU, which runs 
20ms indefinitely and then goes to sleep 80ms within the period of 100ms.
I use xentop to observe guest cpu usage in dom0, as I expect, the guest cpu 
usage is 80% and 160% for guest_1 and guest_2 , respectively.

However, I observe only 60% for guest_1 and 120% for guest_2 by Credit2. It 
can’t reach the throughput as I expected.
Same test workload behaves OK with Credit scheduler as well as CFS in linux 
(tested under centos 7.3 with 3.10 kernel) with KVM virtualization.

Credit2:
xentop - 17:53:01   Xen 4.11.0_02-1
4 domains: 1 running, 3 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 67079796k total, 67078980k used, 816k free    CPUs: 32 @ 2600MHz
 NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS 
NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT 
SSID
  Domain-0 -----r        125    1.9   64050452   95.5   no limit       n/a    
32    0        0        0    0        0        0        0          0          0 
   0
   guest_1 --b---        198   62.1    1048832    1.6    1049600       1.6     
4    1     1359        7    1        0     4116      164     192082      10784  
  0
   guest_2 --b---        349  123.3    1048832    1.6    1049600       1.6     
8    1     1350        9    1        0     4137      176     194002      10934  
  0
  Xenstore --b---          0    0.0      32760    0.0     670720       1.0     
1    0        0        0    0        0        0        0          0          0  
  0

Credit:
xentop - 18:24:04   Xen 4.11.0_02-1
4 domains: 2 running, 2 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 67079796k total, 67078856k used, 940k free    CPUs: 32 @ 2600MHz
 NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS 
NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT 
SSID
  Domain-0 -----r        129    4.9   64050420   95.5   no limit       n/a    
32    0        0        0    0        0        0        0          0          0 
   0
   guest_1 --b---         42   84.4    1048832    1.6    1049600       1.6     
4    1      298        2    1        0     4092      134     191571      10281  
  0
   guest_2 -----r        102  167.0    1048832    1.6    1049600       1.6     
8    1      328        2    1        0     4170      137     192099      10360  
  0

CFS:
top - 17:52:45 up  6:38,  3 users,  load average: 2.82, 2.28, 1.25
Tasks: 774 total,   1 running, 773 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.9 us,  0.2 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26300110+total, 24166513+free, 20429244 used,   906728 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 24160174+avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
                                                                                
                           
13391 root      20   0 5252184 411756   8640 S 163.2  0.2  10:39.15 
/usr/bin/qemu-kvm -name guest=guest_2,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/run/libvirt+
13156 root      20   0 4963472 446500   8644 S  81.8  0.2   6:01.27 
/usr/bin/qemu-kvm -name guest=guest_1,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/run/libvirt+

**************
[Why it happens]
**************
The test workload likes the polling from the long term to see.
As showed in the figure below, the - - - - means the cputime the vcpus is 
running and the ――― means the idle.
As we can see from Fig.1, if vcpu_1 and vcpu_2 can run staggeredly, the 
throughput looks fine, however, if vcpu_1 and vcpu_2 runs at the same time, 
they will compete for pCPU, which results in poor throughput.

vcpu_1        - - - - - - - ――――――――  - - - - - - - ―――――――― - - - - -
                  |                |                               |            
   |                               |
vcpu_2                        - - - - - - - ――――――――  - - - - - - - ――――――――
                  |  vcpu1    |   vcpu2   |               |  vcpu1    |   vcpu2 
  |              |  vcpu1
cpu usage   - - - - - - - - - - - - -  ――――- - - - - - - - - - - - - - ―――― - - 
- - - - - 
                                                           Fig.1

vcpu_1       - - - - - - - ――――――――                  - - - - - - - ―――――――
                 |
vcpu_2       - - - - - - - ――――――――                  - - - - - - - ―――――――
                 |  compete running     |   both sleep         |  compete 
running    |   both sleep    |        
cpu usage   - - - - - - - - - - - - - - ―――――――― - - - - - - - - - - - - - - 
――――――――
                                                           Fig.2

As we do reset_credit() when snext->credit is negative which makes the credit 
value is too close between each vcpu.
As a result, from long term to observe, the time-slice of each vcpu becomes 
smaller, they compete for pCPU at the same time just like shown in Fig.2 above.
Thus, i think the reason why it can't reach the expected throughput is that 
reset_credit() for all vcpu will make the time-slice smaller which is different 
from
Credit and CFS.

**************
[How I resolve it]
**************
For now, I can’t figure out how to solve the problem under this workload 
perfectly.
I have tried only reset the vcpus in runq which the sleeping and running vcpus 
are excluded, it can behave well under this workload because the distance of 
credit value between vcpu becomes larger.
However, this is not the good solution because it hurts fairness obviously even 
i have not observe the actual negative impact with common testsuits like 
geekbench/hackbench yet.

Looking forward to hearing your opinion on this issue.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] xen: credit2: credit2 can’t reach the throughput as expected
  - From: Dario Faggioli

Prev by Date: Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 5/9] nospec: introduce evaluate_nospec
Next by Date: Re: [Xen-devel] [PATCH v5 8/8] microcode: update microcode on cores in parallel
Previous by thread: [Xen-devel] [linux-4.9 test] 133142: trouble: blocked/broken/fail/pass
Next by thread: Re: [Xen-devel] xen: credit2: credit2 can’t reach the throughput as expected
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.