Xen project Mailing List

Re: [Xen-devel] [PATCH] remove blocked time accounting from xen "clockchip"

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 21 Dec 2011 08:32:16 +0000

Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Joe Jin <joe.jin@xxxxxxxxxx>, Zhenzhong Duan <zhenzhong.duan@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Wed, 21 Dec 2011 08:31:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

>>> On 18.10.11 at 22:42, Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > ... because the "clock_event_device framework" already accounts for idle > time through the "event_handler" function pointer in > xen_timer_interrupt(). > > The patch is intended as the completion of [1]. It should fix the double > idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to > stolen time accounting (the removed code seems to be isolated). Finally found time to play with this a little myself. As expected, the change brings a significant improvement, but isn't sufficient to be on par with the forward port kernels not using the clockevent infrastructure (we switched to this only in 2.6.34 and above) when it comes to correct steal time accounting. Specifically, without further adjustments, on a 4:3 over-committed system doing a kernel build, I'm seeing an over-accounting of approximately 4%. I was able to reduce this to slightly above 1% with below (experimental and not 32-bit compatible) change: --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3864,6 +3864,29 @@ return ns; } +#ifndef CONFIG_XEN +#define _cputime_to_cputime64 cputime_to_cputime64 +#else +#define NS_PER_TICK (1000000000 / HZ) +static DEFINE_PER_CPU(u64, stolen_snapshot); +static DEFINE_PER_CPU(unsigned int, stolen_residual); + +static cputime64_t _cputime_to_cputime64(cputime_t t) +{ + //todo not entirely suitable for 32-bit + u64 s = this_cpu_read(runstate.time[RUNSTATE_runnable]); + unsigned long adj = div_u64_rem(s - this_cpu_read(stolen_snapshot) + + this_cpu_read(stolen_residual), + NS_PER_TICK, + &__get_cpu_var(stolen_residual)); + + this_cpu_write(stolen_snapshot, s); + t = cputime_sub(t, jiffies_to_cputime(adj)); + + return cputime_le(t, cputime_max) ? cputime_to_cputime64(t) : 0; +} +#endif + /* * Account user cpu time to a process. * @p: the process that the cpu time gets accounted to @@ -3882,7 +3905,7 @@ account_group_user_time(p, cputime); /* Add user time to cpustat. */ - tmp = cputime_to_cputime64(cputime); + tmp = _cputime_to_cputime64(cputime); if (TASK_NICE(p) > 0) cpustat->nice = cputime64_add(cpustat->nice, tmp); else @@ -3934,7 +3957,7 @@ void __account_system_time(struct task_struct *p, cputime_t cputime, cputime_t cputime_scaled, cputime64_t *target_cputime64) { - cputime64_t tmp = cputime_to_cputime64(cputime); + cputime64_t tmp = _cputime_to_cputime64(cputime); /* Add system time to process. */ p->stime = cputime_add(p->stime, cputime); @@ -3996,7 +4019,7 @@ void account_idle_time(cputime_t cputime) { struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; - cputime64_t cputime64 = cputime_to_cputime64(cputime); + cputime64_t cputime64 = _cputime_to_cputime64(cputime); struct rq *rq = this_rq(); if (atomic_read(&rq->nr_iowait) > 0) @@ -4033,7 +4056,7 @@ struct rq *rq) { cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy); - cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy); + cputime64_t tmp = _cputime_to_cputime64(cputime_one_jiffy); struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; if (irqtime_account_hi_update()) { I currently have no idea what the remain 1% could be attributed to, so thoughts from others would certainly be welcome. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.