[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 1/1] xen/time: do not decrease steal time after live migration on xen
On 10/25/2017 02:45 AM, Dongli Zhang wrote: > After guest live migration on xen, steal time in /proc/stat > (cpustat[CPUTIME_STEAL]) might decrease because steal returned by > xen_steal_lock() might be less than this_rq()->prev_steal_time which is > derived from previous return value of xen_steal_clock(). > > For instance, steal time of each vcpu is 335 before live migration. > > cpu 198 0 368 200064 1962 0 0 1340 0 0 > cpu0 38 0 81 50063 492 0 0 335 0 0 > cpu1 65 0 97 49763 634 0 0 335 0 0 > cpu2 38 0 81 50098 462 0 0 335 0 0 > cpu3 56 0 107 50138 374 0 0 335 0 0 > > After live migration, steal time is reduced to 312. > > cpu 200 0 370 200330 1971 0 0 1248 0 0 > cpu0 38 0 82 50123 500 0 0 312 0 0 > cpu1 65 0 97 49832 634 0 0 312 0 0 > cpu2 39 0 82 50167 462 0 0 312 0 0 > cpu3 56 0 107 50207 374 0 0 312 0 0 > > Since runstate times are cumulative and cleared during xen live migration > by xen hypervisor, the idea of this patch is to accumulate runstate times > to global percpu variables before live migration suspend. Once guest VM is > resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new > runstate times and previously accumulated times stored in global percpu > variables. > > Similar and more severe issue would impact prior linux 4.8-4.10 as > discussed by Michael Las at > https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest, > which would overflow steal time and lead to 100% st usage in top command > for linux 4.8-4.10. A backport of this patch would fix that issue. > > References: > https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest > Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx> > > --- > Changed since v1: > * relocate modification to xen_get_runstate_snapshot_cpu > > Changed since v2: > * accumulate runstate times before live migration > > --- > drivers/xen/manage.c | 1 + > drivers/xen/time.c | 19 +++++++++++++++++++ > include/xen/xen-ops.h | 1 + > 3 files changed, 21 insertions(+) > > diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c > index c425d03..9aa2955 100644 > --- a/drivers/xen/manage.c > +++ b/drivers/xen/manage.c > @@ -72,6 +72,7 @@ static int xen_suspend(void *data) > } > > gnttab_suspend(); > + xen_accumulate_runstate_time(); > xen_arch_pre_suspend(); > > /* > diff --git a/drivers/xen/time.c b/drivers/xen/time.c > index ac5f23f..6df3f82 100644 > --- a/drivers/xen/time.c > +++ b/drivers/xen/time.c > @@ -19,6 +19,8 @@ > /* runstate info updated by Xen */ > static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); > > +static DEFINE_PER_CPU(u64[4], old_runstate_time); > + > /* return an consistent snapshot of 64-bit time/counter value */ > static u64 get64(const u64 *p) > { > @@ -52,6 +54,7 @@ static void xen_get_runstate_snapshot_cpu(struct > vcpu_runstate_info *res, > { > u64 state_time; > struct vcpu_runstate_info *state; > + int i; > > BUG_ON(preemptible()); > > @@ -64,6 +67,22 @@ static void xen_get_runstate_snapshot_cpu(struct > vcpu_runstate_info *res, > rmb(); /* Hypervisor might update data. */ > } while (get64(&state->state_entry_time) != state_time || > (state_time & XEN_RUNSTATE_UPDATE)); > + > + for (i = 0; i < 4; i++) > + res->time[i] += per_cpu(old_runstate_time, cpu)[i]; > +} > + > +void xen_accumulate_runstate_time(void) > +{ > + struct vcpu_runstate_info state; > + int cpu; > + > + for_each_possible_cpu(cpu) { > + xen_get_runstate_snapshot_cpu(&state, cpu); > + memcpy(per_cpu(old_runstate_time, cpu), > + state.time, > + 4 * sizeof(u64)); sizeof(old_runstate_time). (I think this should work for per_cpu variables) > + } Hmm.. This may not perform as intended if we are merely checkpointing (or pausing) the guest (i.e. if HYPERVISOR_suspend() returns 1). We will double-account for the last interval that the guest has run. I'd rather not have yet another per-cpu variable but I can't think of anything else. Perhaps you or others can come up with something better. -boris > } > > /* > diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h > index 218e6aa..5680059 100644 > --- a/include/xen/xen-ops.h > +++ b/include/xen/xen-ops.h > @@ -32,6 +32,7 @@ void xen_resume_notifier_unregister(struct notifier_block > *nb); > bool xen_vcpu_stolen(int vcpu); > void xen_setup_runstate_info(int cpu); > void xen_time_setup_guest(void); > +void xen_accumulate_runstate_time(void); > void xen_get_runstate_snapshot(struct vcpu_runstate_info *res); > u64 xen_steal_clock(int cpu); > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |