[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs

On 05/08/2015 12:34 PM, Jan Beulich wrote:
On 08.05.15 at 12:20, <JGross@xxxxxxxx> wrote:
On 05/06/2015 05:10 PM, Dario Faggioli wrote:
in fact, before this change, shutting down or suspending the
system with some CPUs not assigned to any cpupool, would
crash as follows:

    (XEN) Xen call trace:
    (XEN)    [<ffff82d080101757>] disable_nonboot_cpus+0xb5/0x138
    (XEN)    [<ffff82d0801a8824>] enter_state_helper+0xbd/0x369
    (XEN)    [<ffff82d08010614a>] continue_hypercall_tasklet_handler+0x4a/0xb1
    (XEN)    [<ffff82d0801320bd>] do_tasklet_work+0x78/0xab
    (XEN)    [<ffff82d0801323f3>] do_tasklet+0x5e/0x8a
    (XEN)    [<ffff82d080163cb6>] idle_loop+0x56/0x6b
    (XEN) ****************************************
    (XEN) Panic on CPU 0:
    (XEN) Xen BUG at cpu.c:191
    (XEN) ****************************************

This is because, for free CPUs, -EBUSY were being returned
when trying to tear them down, making cpu_down() unhappy.

It is certainly unpractical to forbid shutting down or
suspenging if there are unassigned CPUs, so this change
fixes the above by just avoiding returning -EBUSY for those
CPUs. If shutting off, that does not matter much anyway. If
suspending, we make sure that the CPUs remain unassigned
when resuming.

While there, take the chance to:
   - fix the doc comment of cpupool_cpu_remove() (it was
   - improve comments in general around and in cpupool_cpu_remove()
     and cpupool_cpu_add();
   - add a couple of ASSERT()-s for checking consistency.

I did a test with the patches applied.

# xl cpupool-cpu-remove Pool-0 2
# echo mem >/sys/power/state

When resuming this resulted in:

(XEN) mce_intel.c:735: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0
extended MCE MSR 0
(XEN) CPU0 CMCI LVT vector (0xf2) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Xen BUG at cpu.c:149
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080101531>] cpu_up+0xaf/0xfe
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) rax: 0000000000008016   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d080101531>] cpu_up+0xaf/0xfe
(XEN)    [<ffff82d080101733>] enable_nonboot_cpus+0x4f/0xfc
(XEN)    [<ffff82d0801a6a8d>] enter_state_helper+0x2cb/0x370
(XEN)    [<ffff82d08010615f>] continue_hypercall_tasklet_handler+0x4a/0xb1
(XEN)    [<ffff82d08013101d>] do_tasklet_work+0x78/0xab
(XEN)    [<ffff82d08013134c>] do_tasklet+0x5e/0x8a
(XEN)    [<ffff82d080161bcb>] idle_loop+0x56/0x70
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at cpu.c:149
(XEN) ****************************************

Which would seem to more likely be a result of patch 2. Having
taken a closer look - is setting ret to -EINVAL at the top of
cpupool_cpu_add() really correct? I.e. it is guaranteed that
at least one of the two places altering ret will always be run
into? If it is, then I'd still suspect one of the two
cpupool_assign_cpu_locked() invocations to be failing.

Indeed. Setting ret to 0 initially does the trick. With this
modification suspend/resume and power off are working with cpus
not allocated to any cpupool.

Dario, I suggest you write another patch to correct patch 2.

For patch 3 with patch 2 corrected:

Reviewed-by: Juergen Gross <jgross@xxxxxxxx>
Tested-by: Juergen Gross <jgross@xxxxxxxx>


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.