[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [BUG] Linux process vruntime accounting in Xen
In virtualized environments, sometimes we need to limit the CPU resources to a virtual machine(VM). For example in Xen, we use $ xl sched-credit -d 1 -c 50 to limit the CPU resource of dom 1 as half of one physical CPU core. If the VM CPU resource is capped, the process inside the VM will have a vruntime accounting problem. Here, I report my findings about Linux process scheduler under the above scenario. ------------Description------------ Linux CFS relies on delta_exec to charge the vruntime of processes. The variable delta_exec is the difference of a process starts and stops running on a CPU. This works well in physical machine. However, in virtual machine under capped resources, some processes might be accounted with inaccurate vruntime. For example, suppose we have a VM which has one vCPU and is capped to have as much as 50% of a physical CPU. When process A inside the VM starts running and the CPU resource of that VM runs out, the VM will be paused. Next round when the VM is allocated new CPU resource and starts running again, process A stops running and is put back to the runqueue. The delta_exec of process A is accounted as its "real execution time" plus the paused time of its VM. That will make the vruntime of process A much larger than it should be and process A would not be scheduled again for a long time until the vruntimes of other processes catch it. --------------------------------------- ------------Analysis---------------- When a process stops running and is going to put back to the runqueue, update_curr() will be executed. [src/kernel/sched/fair.c] static void update_curr(struct cfs_rq *cfs_rq) { ... ... delta_exec = now - curr->exec_start; ... ... curr->exec_start = now; ... ... curr->sum_exec_runtime += delta_exec; schedstat_add(cfs_rq, exec_clock, delta_exec); curr->vruntime += calc_delta_fair(delta_exec, curr); update_min_vruntime(cfs_rq); ... ... } "now" --> the right now time "exec_start" --> the time when the current process is put on the CPU "delta_exec" --> the time difference of a process between it starts and stops running on the CPU When a process starts running before its VM is paused and the process stops running after its VM is unpaused, the delta_exec will include the VM suspend time which is pretty large compared to the real execution time of a process. This issue will make a great performance harm to the victim process. If the process is an I/O-bound workload, its throughput and latency will be influenced. If the process is a CPU-bound workload, this issue will make its vruntime "unfair" compared to other processes under CFS. Because the CPU resource of some type VMs in the cloud are limited as the above describes(like Amazon EC2 t2.small instance), I doubt that will also harm the performance of public cloud instances. --------------------------------------- My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux 3.18.21), Dom U(Linux 3.18.21). I also test longterm version Linux 3.18.30 and the latest longterm version, Linux 4.4.7. Those kernels all have this issue. Please confirm this bug. Thanks. -- Tony. S Ph. D student of University of Colorado, Colorado Springs _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |