[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug



Hi,

On 4/8/19 3:33 PM, Andrew Cooper wrote:
On 08/04/2019 14:53, Julien Grall wrote:
Hi Andrew,

On 4/8/19 1:09 PM, Andrew Cooper wrote:
On 08/04/2019 12:38, Julien Grall wrote:
Hi,

On 4/8/19 11:47 AM, Andrew Cooper wrote:
On 08/04/2019 11:39, Julien Grall wrote:
Hi,

On 4/8/19 10:39 AM, Andrew Cooper wrote:
+    case CPU_RESUME_FAILED:
+        if ( !park_offline_cpus && system_state !=
SYS_STATE_suspend )

This patch breaks compilation on arm32/arm64 because
park_offline_cpus
is not defined:

timer.c: In function 'cpu_callback':
timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in
this function)
            if ( !park_offline_cpus && system_state !=
SYS_STATE_suspend )
                  ^~~~~~~~~~~~~~~~~

What is the purpose of park_offline_cpus?

Sorry.  I should have waited for a full build test first.

park_offline_cpus is a workaround for Intel's MCE behaviour, where the
system will shut down rather than deliver an #MC if machine checking
isn't configured on all CPUs.

As a result, we have to start all CPUs, even beyond maxcpus= and
set up
machine check handling, and never ever free their stacks, even if we'd
prefer the CPUs to be offline.

I am a bit confused, why this is necessary now for the timer and not
in other places of the common code?


Are you happy with a

#define park_offline_cpus false >
in ARM?

The name is fairly confusing if you don't know the background.

But I have to admit that even with your explanation above, I still
don't understand why you need to check park_offline_cpus in the timers.

It is all to do with how/when we free per-cpu data.

Technically speaking (with the memory leak fixed) the old arrangement
ought to function correctly, but the new arrangement is more efficient.
Where would the free happen in the "less efficient" way?

I don't quite understand the question.

You mention that this new arrangement is "more efficient". So I was asking what was the previous solution.


https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg02252.html
is the v1 patch, but that has already been rejected for not using the
up-to-date notifier layout.

At the expense of introducing an undocumented x86-ism in common code. I would have nacked such patch if I had time too (only hour between post and commit). For common code, could I request a bit more time to allow other of the maintainers give an opinion if they have?

I don't have a better name for park_offline_cpus, but we at least need more documentation for someone to understand the purpose of it without looking at x86. Then, I would be able to understand what is the correct fix for Arm.

For time being, I would just revert this patch.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.