[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs

On 05/08/2015 03:12 PM, Dario Faggioli wrote:
On Fri, 2015-05-08 at 12:47 +0200, Juergen Gross wrote:
On 05/08/2015 12:34 PM, Jan Beulich wrote:

(XEN) Xen call trace:
(XEN)    [<ffff82d080101531>] cpu_up+0xaf/0xfe
(XEN)    [<ffff82d080101733>] enable_nonboot_cpus+0x4f/0xfc
(XEN)    [<ffff82d0801a6a8d>] enter_state_helper+0x2cb/0x370
(XEN)    [<ffff82d08010615f>] continue_hypercall_tasklet_handler+0x4a/0xb1
(XEN)    [<ffff82d08013101d>] do_tasklet_work+0x78/0xab
(XEN)    [<ffff82d08013134c>] do_tasklet+0x5e/0x8a
(XEN)    [<ffff82d080161bcb>] idle_loop+0x56/0x70
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at cpu.c:149
(XEN) ****************************************

Which would seem to more likely be a result of patch 2. Having
taken a closer look - is setting ret to -EINVAL at the top of
cpupool_cpu_add() really correct? I.e. it is guaranteed that
at least one of the two places altering ret will always be run
into? If it is, then I'd still suspect one of the two
cpupool_assign_cpu_locked() invocations to be failing.


Not really.

Well, the problem is, of course, related, as your test shows, and I now
see why this happens, but it's all patch 3 fault (see below).

So what's in tree right now is ok and there is no need to revert. I
believe the best thing to do is for me to send a new, fixed, version of
patch 3. The fix would probably still be just changing "int ret =
-EINVAL" to "int ret = 0" in cpupool_cpu_add(), but that should be done
within patch 3, not as a fix to patch 2, which was indeed right.

What do you both think?

Setting ret to 0 initially does the trick.

Yes. However, as far as patch 2 is concerned, that initialization to
-EINVAL is ok, as we are sure and it is guaranted that at least one of
the two places altering ret is executed, as Jan was wandering. (Well,
because of that, the initialization is not that important, I just added
it to be extra-cautious.)

The problem is, in patch 3, when that code becomes:

     int ret = -EINVAL;

     if ( system_state == SYS_STATE_resume )
         <look for the cpu>
           ret = cpupool_assign_cpu_locked(*c, cpu);

         ret = cpupool_assign_cpu_locked(cpupool0, cpu);

In fact, now, if the cpu was free when suspending, we won't find it
anywhere when looking for it in the system_state==SYS_STATE_resume case,
and hence we won't call cpupool_assign_cpu_locked(). Then, because of
the 'if() else', we don't call it below either (as we did before), and
hence no one alters 'ret'.

That is my point, actually: in patch 2, we are sure ret will be altered.
In patch 3, it's no longer guaranteed that we alter ret, and the case in
which we don't is perfectly fine, so ret should be inited to 0.

With this
modification suspend/resume and power off are working with cpus
not allocated to any cpupool.

Great to know, thanks for testing... and sorry for not having been able
to do so myself. My test box allows me to "echo mem >/sys/power/state",
and it seems to suspend ok (e.g., power led is blinking)... but then it
just does not resume. :-/

Dario, I suggest you write another patch to correct patch 2.

For patch 3 with patch 2 corrected:

Reviewed-by: Juergen Gross <jgross@xxxxxxxx>
Tested-by: Juergen Gross <jgross@xxxxxxxx>

If you agree on my plan of sending v2 of patch3, and if that will really
be just the same of v1, but with "int ret=0", I'll stick these tags
there, unless you tell me not to.

I don't mind how you are doing it. The machine crashed even without
patch 2 when suspending with at least one free cpu, so this patch isn't
making anything worse.

You can still apply my 2 *.by: tags, of course.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.