[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get


On 17.08.21 17:15, Juergen Gross wrote:
On 17.08.21 16:50, Marek Marczykowski-Górecki wrote:
On Tue, Aug 17, 2021 at 04:04:24PM +0200, Jan Beulich wrote:
On 17.08.2021 15:48, Marek Marczykowski-Górecki wrote:
On Tue, Aug 17, 2021 at 02:29:20PM +0100, Andrew Cooper wrote:
On 17/08/2021 14:21, Jan Beulich wrote:
On 17.08.2021 15:06, Andrew Cooper wrote:
Perhaps we want the cpu_down() logic to explicitly invalidate their
lazily cached values?
I'd rather do this on the cpu_up() path (no point clobbering what may
get further clobbered, and then perhaps not to a value of our liking), yet then we can really avoid doing this from a notifier and instead do it early enough in xstate_init() (taking care of XSS at the same time).

Funny you mention notifiers. Apparently cpufreq driver does use it to
initialize things. And fails to do so:

(XEN) Finishing wakeup from ACPI S3 state.
(XEN) CPU0: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
(XEN) Enabling non-boot CPUs  ...
(XEN) CPU1: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
(XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d04024ad2b>] vcpu_runstate_get+0x153/0x244
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830049667c50   rcx: 0000000000000001 (XEN) rdx: 000000321d74d000   rsi: ffff830049667c50   rdi: ffff83025dcc0000 (XEN) rbp: ffff830049667c40   rsp: ffff830049667c10   r8: ffff83020511a820 (XEN) r9:  ffff82d04057ef78   r10: 0180000000000000   r11: 8000000000000000 (XEN) r12: ffff83025dcc0000   r13: ffff830205118c60   r14: 0000000000000001 (XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000003526e0
(XEN) cr3: 0000000049656000   cr2: 0000000000000028
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d04024ad2b> (vcpu_runstate_get+0x153/0x244): (XEN)  48 8b 14 ca 48 8b 04 02 <4c> 8b 70 28 e9 01 ff ff ff 4c 8d 3d dd 64 32 00
(XEN) Xen stack trace from rsp=ffff830049667c10:
(XEN)    0000000000000180 ffff83025dcbd410 ffff83020511bf30 ffff830205118c60 (XEN)    0000000000000001 0000000000000010 ffff830049667c80 ffff82d04024ae73 (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN)    0000000000000000 0000000000000000 ffff830049667cb8 ffff82d0402560a9 (XEN)    ffff830205118320 0000000000000001 ffff83020511bf30 ffff83025dc7a6f0 (XEN)    0000000000000000 ffff830049667d58 ffff82d040254cb1 00000001402e9f74 (XEN)    0000000000000000 ffff830049667d10 ffff82d040224eda 000000000025dc81 (XEN)    000000321d74d000 ffff82d040571278 0000000000000001 ffff830049667d28 (XEN)    ffff82d040228b44 ffff82d0403102cf 0000000000000000 ffff82d0402283a4 (XEN)    ffff82d040459688 ffff82d040459680 ffff82d040459240 0000000000000004 (XEN)    0000000000000000 ffff830049667d68 ffff82d04025510e ffff830049667db0 (XEN)    ffff82d040221ba4 0000000000000000 0000000000000001 0000000000000001 (XEN)    0000000000000000 ffff830049667e00 0000000000000001 ffff82d04058a5c0 (XEN)    ffff830049667dc8 ffff82d040203867 0000000000000001 ffff830049667df0 (XEN)    ffff82d040203c51 ffff82d040459400 0000000000000001 0000000000000010 (XEN)    ffff830049667e20 ffff82d040203e26 ffff830049667ef8 0000000000000000 (XEN)    0000000000000003 0000000000000200 ffff830049667e50 ffff82d040270bac (XEN)    ffff83020116a640 ffff830258ff6000 0000000000000000 0000000000000000 (XEN)    ffff830049667e70 ffff82d0402056aa ffff830258ff61b8 ffff82d0405701b0 (XEN)    ffff830049667e88 ffff82d04022963c ffff82d0405701a0 ffff830049667eb8
(XEN) Xen call trace:
(XEN)    [<ffff82d04024ad2b>] R vcpu_runstate_get+0x153/0x244

This is xen/common/sched/core.c:322. get_sched_res(v->processor) is
NULL at this point for CPU1.

The only place that can calls set_sched_res() and doesn't expect the
previous value to be valid, is cpu_schedule_up(). For non-BSP its called
_only_ from notifier at CPU_UP_PREPARE (cpu_schedule_callback()), but
that notifier explicitly exclude suspend/resume case:

     static int cpu_schedule_callback(
         struct notifier_block *nfb, unsigned long action, void *hcpu)
         unsigned int cpu = (unsigned long)hcpu;
         int rc = 0;

          * All scheduler related suspend/resume handling needed is done in
          * cpupool.c.
         if ( system_state > SYS_STATE_active )
             return NOTIFY_DONE;

But, nothing in cpupool.c is calling into set_sched_res().

On the other hand, sched_rm_cpu() (which I believe is called as part of
parking the CPU) calls cpu_schedule_down(), which then calls
set_sched_res(cpu, NULL).

In short: scheduler for parked CPUs is not re-initialized during resume.
But cpufreq expects it to be...

I'll be looking into that.

Could you please try the attached patch?


Attachment: 0001-xen-sched-fix-get_cpu_idle_time-for-smt-0-suspend-re.patch
Description: Text Data

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.