[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs
On 05/08/2015 03:12 PM, Dario Faggioli wrote: On Fri, 2015-05-08 at 12:47 +0200, Juergen Gross wrote:On 05/08/2015 12:34 PM, Jan Beulich wrote:(XEN) Xen call trace: (XEN) [<ffff82d080101531>] cpu_up+0xaf/0xfe (XEN) [<ffff82d080101733>] enable_nonboot_cpus+0x4f/0xfc (XEN) [<ffff82d0801a6a8d>] enter_state_helper+0x2cb/0x370 (XEN) [<ffff82d08010615f>] continue_hypercall_tasklet_handler+0x4a/0xb1 (XEN) [<ffff82d08013101d>] do_tasklet_work+0x78/0xab (XEN) [<ffff82d08013134c>] do_tasklet+0x5e/0x8a (XEN) [<ffff82d080161bcb>] idle_loop+0x56/0x70 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at cpu.c:149 (XEN) ****************************************Which would seem to more likely be a result of patch 2. Having taken a closer look - is setting ret to -EINVAL at the top of cpupool_cpu_add() really correct? I.e. it is guaranteed that at least one of the two places altering ret will always be run into? If it is, then I'd still suspect one of the two cpupool_assign_cpu_locked() invocations to be failing.Indeed.Not really. Well, the problem is, of course, related, as your test shows, and I now see why this happens, but it's all patch 3 fault (see below). So what's in tree right now is ok and there is no need to revert. I believe the best thing to do is for me to send a new, fixed, version of patch 3. The fix would probably still be just changing "int ret = -EINVAL" to "int ret = 0" in cpupool_cpu_add(), but that should be done within patch 3, not as a fix to patch 2, which was indeed right. What do you both think?Setting ret to 0 initially does the trick.Yes. However, as far as patch 2 is concerned, that initialization to -EINVAL is ok, as we are sure and it is guaranted that at least one of the two places altering ret is executed, as Jan was wandering. (Well, because of that, the initialization is not that important, I just added it to be extra-cautious.) The problem is, in patch 3, when that code becomes: int ret = -EINVAL; if ( system_state == SYS_STATE_resume ) { <look for the cpu> ret = cpupool_assign_cpu_locked(*c, cpu); } else { ret = cpupool_assign_cpu_locked(cpupool0, cpu); } In fact, now, if the cpu was free when suspending, we won't find it anywhere when looking for it in the system_state==SYS_STATE_resume case, and hence we won't call cpupool_assign_cpu_locked(). Then, because of the 'if() else', we don't call it below either (as we did before), and hence no one alters 'ret'. That is my point, actually: in patch 2, we are sure ret will be altered. In patch 3, it's no longer guaranteed that we alter ret, and the case in which we don't is perfectly fine, so ret should be inited to 0.With this modification suspend/resume and power off are working with cpus not allocated to any cpupool.Great to know, thanks for testing... and sorry for not having been able to do so myself. My test box allows me to "echo mem >/sys/power/state", and it seems to suspend ok (e.g., power led is blinking)... but then it just does not resume. :-/Dario, I suggest you write another patch to correct patch 2. For patch 3 with patch 2 corrected: Reviewed-by: Juergen Gross <jgross@xxxxxxxx> Tested-by: Juergen Gross <jgross@xxxxxxxx>If you agree on my plan of sending v2 of patch3, and if that will really be just the same of v1, but with "int ret=0", I'll stick these tags there, unless you tell me not to. I don't mind how you are doing it. The machine crashed even without patch 2 when suspending with at least one free cpu, so this patch isn't making anything worse. You can still apply my 2 *.by: tags, of course. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |