[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] [Xen-devel] high CPU stolen time after live migrate
Hi Dario and Olivier, I have just encountered this issue in the past. While the fix mentioned in the link is effective, I assume the fix was derived from upstream linux and it will introduce new error as mentioned below. While there is a kernel bug in the guest kernel, I think the root cause is at the hypervisor side. From my own test, the issue is reproducible even when migration a VM locally within the same dom0. From the test, once guest VM is migrated, RUNSTATE_offline time looks normal, while RUNSTATE_runnable is moving backward and decreased. Therefore, the value returned by paravirt_steal_clock() (actually xen_steal_clock()), which is equivalent to the sum of RUNSTATE_offline and RUNSTATE_runnable, is decreased as well. However, the kernel such as 4.8 could not handle this special situation correctly as the code in cputime.c is not written specifically for xen hypervisor. For kernel like v4.8-rc8, would something as below would be better? diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index a846cf8..3546e21 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -274,11 +274,17 @@ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime) if (static_key_false(¶virt_steal_enabled)) { cputime_t steal_cputime; u64 steal; + s64 steal_diff; steal = paravirt_steal_clock(smp_processor_id()); - steal -= this_rq()->prev_steal_time; + steal_diff = steal - this_rq()->prev_steal_time; - steal_cputime = min(nsecs_to_cputime(steal), maxtime); + if (steal_diff < 0) { + this_rq()->prev_steal_time = steal; + return 0; + } + + steal_cputime = min(nsecs_to_cputime(steal_diff), maxtime); account_steal_time(steal_cputime); this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime); This issue seems not getting totally fixed by most up-to-date upstream linux (I have tested with 4.12.0-rc7). The issue in 4.12.0-rc7 is different. After live migration, although the steal clock counter is not overflowed (become a very large unsigned number), the steal clock counter in /proc/stat is moving backward and decreased (e.g., from 329 to 311). test@vm:~$ cat /proc/stat cpu 248 0 240 31197 893 0 1 329 0 0 cpu0 248 0 240 31197 893 0 1 329 0 0 intr 39051 16307 0 0 0 0 0 990 127 592 1004 1360 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 59400 btime 1506731352 processes 1877 procs_running 1 procs_blocked 0 softirq 38903 0 15524 1227 6904 0 0 6 0 0 15242 After live migration, steal counter in ubuntu guest running 4.12.0-rc7 was decreased to 311. test@vm:~$ cat /proc/stat cpu 251 0 242 31245 893 0 1 311 0 0 cpu0 251 0 242 31245 893 0 1 311 0 0 intr 39734 16404 0 0 0 0 0 1440 128 0 8 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 60880 btime 1506731352 processes 1882 procs_running 3 procs_blocked 0 softirq 39195 0 15618 1286 6958 0 0 7 0 0 15326 I assume this is not an expected behavior. A different patch (similar to the one I mentioned above) to upstream linux would fix this issue. --------------------------------------------------------- Whatever the fix would be applied to guest kernel side, I think the root cause is because xen hypervisor returns a RUNSTATE_runnable time less than the previous one before live migration. As I am not clear enough with xen scheduling, I do not understand why RUNSTATE_runnable cputime is decreased after live migration. Dongli Zhang ----- Original Message ----- From: dario.faggioli@xxxxxxxxxx To: xen.list@xxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx Cc: xen-devel@xxxxxxxxxxxxx Sent: Tuesday, October 3, 2017 5:24:49 PM GMT +08:00 Beijing / Chongqing / Hong Kong / Urumqi Subject: Re: [Xen-devel] high CPU stolen time after live migrate On Mon, 2017-10-02 at 18:37 +0200, Olivier Bonvalet wrote: > root! laussor:/proc# cat /proc/uptime > 652005.23 2631328.82 > > > Values for "stolen time" in /proc/stat seems impossible with only 7 > days of uptime. > I think it can be this: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-parav irtualized-xen-guest/ What's the version of your guest kernel? Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx https://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |