[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Linux process vruntime accounting in Xen

[Adding George again, and a few Linux/Xen folks]

On Sat, 2016-05-14 at 18:25 -0600, Tony S wrote:
> In virtualized environments, sometimes we need to limit the CPU
> resources to a virtual machine(VM). For example in Xen, we use
> $ xl sched-credit -d 1 -c 50
> to limit the CPU resource of dom 1 as half of
> one physical CPU core. If the VM CPU resource is capped, the process
> inside the VM will have a vruntime accounting problem. Here, I report
> my findings about Linux process scheduler under the above scenario.
Thanks for this other report as well. :-)

All you say makes sense to me, and I will think about it. I'm not sure
about one thing, though...

> ------------Description------------
> Linux CFS relies on delta_exec to charge the vruntime of processes.
> The variable delta_exec is the difference of a process starts and
> stops running on a CPU. This works well in physical machine. However,
> in virtual machine under capped resources, some processes might be
> accounted with inaccurate vruntime.
> For example, suppose we have a VM which has one vCPU and is capped to
> have as much as 50% of a physical CPU. When process A inside the VM
> starts running and the CPU resource of that VM runs out, the VM will
> be paused. Next round when the VM is allocated new CPU resource and
> starts running again, process A stops running and is put back to the
> runqueue. The delta_exec of process A is accounted as its "real
> execution time" plus the paused time of its VM. That will make the
> vruntime of process A much larger than it should be and process A
> would not be scheduled again for a long time until the vruntimes of
> other
> processes catch it.
> ---------------------------------------
> ------------Analysis----------------
> When a process stops running and is going to put back to the
> runqueue,
> update_curr() will be executed.
> [src/kernel/sched/fair.c]
> static void update_curr(struct cfs_rq *cfs_rq)
> {
>     ... ...
>     delta_exec = now - curr->exec_start;
>     ... ...
>     curr->exec_start = now;
>     ... ...
>     curr->sum_exec_runtime += delta_exec;
>     schedstat_add(cfs_rq, exec_clock, delta_exec);
>     curr->vruntime += calc_delta_fair(delta_exec, curr);
>     update_min_vruntime(cfs_rq);
>     ... ...
> }
> "now" --> the right now time
> "exec_start" --> the time when the current process is put on the CPU
> "delta_exec" --> the time difference of a process between it starts
> and stops running on the CPU
> When a process starts running before its VM is paused and the process
> stops running after its VM is unpaused, the delta_exec will include
> the VM suspend time which is pretty large compared to the real
> execution time of a process.
... but would that also apply to a VM that is not scheduled --just
because of pCPU contention, not because it was paused-- for a few time?

Isn't there anything in place in Xen or Linux (the latter being better
suitable for something like this, IMHO) to compensate for that?

I have to admit I haven't really ever checked myself, maybe either
George or our Linux people do know more?

> This issue will make a great performance harm to the victim process.
> If the process is an I/O-bound workload, its throughput and latency
> will be influenced. If the process is a CPU-bound workload, this
> issue
> will make its vruntime "unfair" compared to other processes under
> CFS.
> Because the CPU resource of some type VMs in the cloud are limited as
> the above describes(like Amazon EC2 t2.small instance), I doubt that
> will also harm the performance of public cloud instances.
> ---------------------------------------
> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
> 3.18.21), Dom U(Linux 3.18.21). I also test longterm version Linux
> 3.18.30 and the latest longterm version, Linux 4.4.7. Those kernels
> all have this issue.
> Please confirm this bug. Thanks.
<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.