[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?




On Thu, Sep 20, 2012 at 8:56 AM, Ben Guthro <ben@xxxxxxxxxx> wrote:
It appears __cpu_disable() is not getting reached at all, for CPU1


I was incorrect about this, after messing around with various serial configs to properly get all output.

I have traced this through to verify that the sequence in question, does, in fact seem to be getting executed.


disable_nonboot_cpus()
cpu_down() 
__cpu_disable() 
play_dead()
cpu_exit_clear()
cpu_uninit()
__cpu_die()
do_suspend_lowlevel()

I also enabled the printk's in smpboot.c


[   32.145824] ACPI: Preparing to enter system sleep state S3
[   32.600118] PM: Saving platform NVS memory
[   32.671666] Disabling non-boot CPUs ...
(XEN) Preparing system for ACPI S3 state.
(XEN) Disabling non-boot CPUs ...
(XEN) Bringing CPU1 down
(XEN) Disabling CPU1
(XEN) Disabled CPU1
(XEN) play_dead: CPU1
(XEN) cpu_exit_clear: CPU1
(XEN) cpu_uninit: CPU1
(XEN) CPU1 dead
(XEN) Entering ACPI S3 state.
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Bringing CPU1 up
(XEN) Setting warm reset code and vector.
(XEN) Asserting INIT.
(XEN) Waiting for send to finish...
(XEN) +Deasserting INIT.
(XEN) Waiting for send to finish...
(XEN) +#startup loops: 2.
(XEN) Sending STARTUP #1.
(XEN) After apic_write.
(XEN) CPU#1 already initialized!
(XEN) Startup point 1.
(XEN) Waiting for send to finish...
(XEN) +Sending STARTUP #2.
(XEN) After apic_write.
(XEN) Startup point 1.
(XEN) Waiting for send to finish...
(XEN) +After Startup.
(XEN) After Callout 1.
(XEN) Stuck ??
(XEN) cpu_exit_clear: CPU1
(XEN) cpu_uninit: CPU1
(XEN) __cpu_up - do_boot_cpu error
(XEN) cpu_up CPU1 CPU not up
(XEN) cpu_up CPU1 fail
(XEN) Error taking CPU1 up: -5
[   32.780055] ACPI: Low-level resume complete
[   32.780055] PM: Restoring platform NVS memory
[   32.780055] Enabling non-boot CPUs ...

then it crashes.

It seems that it is always falling through into the "else" clause of the do_boot_cpu() function when attempting to bring it back up, seemingly stuck in CPU_STATE_CALLOUT


Any ideas as to what might be causing it to get stuck in that state?




 
I put a cpu id conditional BUG() call in there, to verify - and while it is reached when using 
xen-hptool cpu-offline 1
It never seems to be reached from the S3 path.


What is the expected call chain to get into this code during S3?


On Thu, Sep 20, 2012 at 4:03 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> On 20.09.12 at 08:13, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
> CPU#1 got stuck in loop in cpu_init() as it appears to be Œalready
> initialised¹ in cpu_initialized bitmap. CPU#0 detects it is stuck and
> carries on, but the resume code assumes all CPUs are brought back online and
> crashes later.

So this would suggest play_dead() (-> cpu_exit_clear() ->
cpu_uninit()) not getting reached during the suspend cycle.
That should be fairly easy to verify, as the serial console
ought to still work when the secondary CPUs get offlined.

That might imply cpumask_clear_cpu(cpu, &cpu_online_map)
not getting reached in __cpu_disable(), which would be in line
with the observation that none of the logs provided so far
showed anything being done by fixup_irqs() (called right
after clearing the online bit).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.