[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs
>>> On 08.05.15 at 12:20, <JGross@xxxxxxxx> wrote: > On 05/06/2015 05:10 PM, Dario Faggioli wrote: >> in fact, before this change, shutting down or suspending the >> system with some CPUs not assigned to any cpupool, would >> crash as follows: >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d080101757>] disable_nonboot_cpus+0xb5/0x138 >> (XEN) [<ffff82d0801a8824>] enter_state_helper+0xbd/0x369 >> (XEN) [<ffff82d08010614a>] continue_hypercall_tasklet_handler+0x4a/0xb1 >> (XEN) [<ffff82d0801320bd>] do_tasklet_work+0x78/0xab >> (XEN) [<ffff82d0801323f3>] do_tasklet+0x5e/0x8a >> (XEN) [<ffff82d080163cb6>] idle_loop+0x56/0x6b >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at cpu.c:191 >> (XEN) **************************************** >> >> This is because, for free CPUs, -EBUSY were being returned >> when trying to tear them down, making cpu_down() unhappy. >> >> It is certainly unpractical to forbid shutting down or >> suspenging if there are unassigned CPUs, so this change >> fixes the above by just avoiding returning -EBUSY for those >> CPUs. If shutting off, that does not matter much anyway. If >> suspending, we make sure that the CPUs remain unassigned >> when resuming. >> >> While there, take the chance to: >> - fix the doc comment of cpupool_cpu_remove() (it was >> wrong); >> - improve comments in general around and in cpupool_cpu_remove() >> and cpupool_cpu_add(); >> - add a couple of ASSERT()-s for checking consistency. > > I did a test with the patches applied. > > # xl cpupool-cpu-remove Pool-0 2 > # echo mem >/sys/power/state > > When resuming this resulted in: > > (XEN) mce_intel.c:735: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 > extended MCE MSR 0 > (XEN) CPU0 CMCI LVT vector (0xf2) already installed > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs ... > (XEN) Xen BUG at cpu.c:149 > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d080101531>] cpu_up+0xaf/0xfe > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor > (XEN) rax: 0000000000008016 rbx: 0000000000000000 rcx: 0000000000000000 [...] > (XEN) Xen call trace: > (XEN) [<ffff82d080101531>] cpu_up+0xaf/0xfe > (XEN) [<ffff82d080101733>] enable_nonboot_cpus+0x4f/0xfc > (XEN) [<ffff82d0801a6a8d>] enter_state_helper+0x2cb/0x370 > (XEN) [<ffff82d08010615f>] continue_hypercall_tasklet_handler+0x4a/0xb1 > (XEN) [<ffff82d08013101d>] do_tasklet_work+0x78/0xab > (XEN) [<ffff82d08013134c>] do_tasklet+0x5e/0x8a > (XEN) [<ffff82d080161bcb>] idle_loop+0x56/0x70 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at cpu.c:149 > (XEN) **************************************** Which would seem to more likely be a result of patch 2. Having taken a closer look - is setting ret to -EINVAL at the top of cpupool_cpu_add() really correct? I.e. it is guaranteed that at least one of the two places altering ret will always be run into? If it is, then I'd still suspect one of the two cpupool_assign_cpu_locked() invocations to be failing. In any event, unless confirmed otherwise we may need to revert patch 2 for the time being. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |