[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?



On 20/09/2012 21:30, "Ben Guthro" <ben@xxxxxxxxxx> wrote:

> (XEN) Bringing CPU1 down
> (XEN) Disabling CPU1
> (XEN) Disabled CPU1
> (XEN) play_dead: CPU1
> (XEN) cpu_exit_clear: CPU1
> (XEN) cpu_uninit: CPU1
> (XEN) CPU1 dead

So CPU1 is taken down properly, apparently...

> (XEN) Entering ACPI S3 state.

... During S3 suspend.

> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs Â...
> (XEN) Bringing CPU1 up
> (XEN) Setting warm reset code and vector.
> (XEN) Asserting INIT.
> (XEN) Waiting for send to finish...
> (XEN) +Deasserting INIT.
> (XEN) Waiting for send to finish...
> (XEN) +#startup loops: 2.
> (XEN) Sending STARTUP #1.
> (XEN) After apic_write.
> (XEN) CPU#1 already initialized!

But here CPU1 thinks it is already initialised! *This* is the bug you need
to go look at. CPU1 will spin at this point...

> (XEN) Startup point 1.
> (XEN) Waiting for send to finish...
> (XEN) +Sending STARTUP #2.
> (XEN) After apic_write.
> (XEN) Startup point 1.
> (XEN) Waiting for send to finish...
> (XEN) +After Startup.
> (XEN) After Callout 1.
> (XEN) Stuck ??

...Causing CPU0 to think CPU1 is stuck (which is fair, because it is).

> (XEN) cpu_exit_clear: CPU1
> (XEN) cpu_uninit: CPU1
> (XEN) __cpu_up - do_boot_cpu error
> (XEN) cpu_up CPU1 CPU not up
> (XEN) cpu_up CPU1 fail
> (XEN) Error taking CPU1 up: -5
> [ Â 32.780055] ACPI: Low-level resume complete
> [ Â 32.780055] PM: Restoring platform NVS memory
> [ Â 32.780055] Enabling non-boot CPUs ...
> 
> then it crashes.
> 
> It seems that it is always falling through into the "else" clause of
> theÂdo_boot_cpu() function when attempting to bring it back up, seemingly
> stuck inÂCPU_STATE_CALLOUT
> 
> Any ideas as to what might be causing it to get stuck in that state?

Yes, see explanation above, which is actually the same explanation I gave
you before. You need to go investigate why CPU1 is getting confused in
cpu_init().

 -- Keir

> 
> 
> 
> Â
>> I put a cpu id conditional BUG() call in there, to verify - and while it is
>> reached when usingÂ
>> xen-hptool cpu-offline 1
>> It never seems to be reached from the S3 path.
>> 
>> 
>> What is the expected call chain to get into this code during S3?
>> 
>> 
>> On Thu, Sep 20, 2012 at 4:03 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>> On 20.09.12 at 08:13, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
>>>> CPU#1 got stuck in loop in cpu_init() as it appears to be Åalready
>>>> initialised in cpu_initialized bitmap. CPU#0 detects it is stuck and
>>>> carries on, but the resume code assumes all CPUs are brought back online
>>>> and
>>>> crashes later.
>>> 
>>> So this would suggest play_dead() (-> cpu_exit_clear() ->
>>> cpu_uninit()) not getting reached during the suspend cycle.
>>> That should be fairly easy to verify, as the serial console
>>> ought to still work when the secondary CPUs get offlined.
>>> 
>>> That might imply cpumask_clear_cpu(cpu, &cpu_online_map)
>>> not getting reached in __cpu_disable(), which would be in line
>>> with the observation that none of the logs provided so far
>>> showed anything being done by fixup_irqs() (called right
>>> after clearing the online bit).
>>> 
>>> Jan
>> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.