|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> And this time, do it right. In fact, a similar change was
> attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity
> on the cpupool_unassign_cpu() path"). But that was buggy, and got
> reverted with 8395b67ab0b8a86.
>
> However, even though reverting was the right thing to do, it
> remains true that:
> - calling the function is better done in the cpupool cpu removal
> code, even if just for simmetry with the cpupool cpu adding path;
> - it is not necessary to call it during cpu teardown (for suspend
> or shutdown) code as we either are going down and will never
> come up (shutdown) or, when coming up, we want everything to be
> as before the tearing down process started, and so we would just
> undo any update made during the process.
> - calling it from the teardown path is not only unnecessary, but
> it can trigger an ASSERT(), in case we get, during the process,
> to remove the last online pcpu of a domain's node affinity:
>
> (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
> (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----
> ... ... ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d0801055b9>] domain_update_node_affinity+0x113/0x240
> (XEN) [<ffff82d08012e676>] cpu_disable_scheduler+0x334/0x3f2
> (XEN) [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
> (XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b
> (XEN) [<ffff82d080130ad9>] stopmachine_action+0x70/0x99
> (XEN) [<ffff82d08013274f>] do_tasklet_work+0x78/0xab
> (XEN) [<ffff82d080132a85>] do_tasklet+0x5e/0x8a
> (XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 12:
> (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
> (XEN) ****************************************
>
> Therefore, for all these reasons, move the call from
> cpu_disable_schedule() to cpupool_unassign_cpu_helper().
>
> While there, add some sanity checking (in the latter function), and
> make sure that scanning the domain list is done with domlist_read_lock
> held, at least when the system is 'live'.
>
> I re-tested the scenario described in here:
> http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310
>
> which is what led to the revert of 93be8285a79c6, and that is
> working ok after this commit.
>
> Signed-off-by: <dario.faggioli@xxxxxxxxxx>
Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> ---
> Cc: Juergen Gross <jgross@xxxxxxxx>
> Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> ---
> xen/common/cpupool.c | 18 ++++++++++++++++++
> xen/common/schedule.c | 7 ++++++-
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..79bcb21 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c,
> unsigned int cpu)
> static long cpupool_unassign_cpu_helper(void *info)
> {
> int cpu = cpupool_moving_cpu;
> + struct cpupool *c = info;
> + struct domain *d;
> long ret;
>
> cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
> cpupool_cpu_moving->cpupool_id, cpu);
>
> spin_lock(&cpupool_lock);
> + if ( c != cpupool_cpu_moving )
> + {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + /*
> + * We need this for scanning the domain list, both in
> + * cpu_disable_scheduler(), and at the bottom of this function.
> + */
> + rcu_read_lock(&domlist_read_lock);
> ret = cpu_disable_scheduler(cpu);
> cpumask_set_cpu(cpu, &cpupool_free_cpus);
> if ( !ret )
> @@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info)
> cpupool_cpu_moving = NULL;
> }
>
> + for_each_domain_in_cpupool(d, c)
> + {
> + domain_update_node_affinity(d);
> + }
> + rcu_read_unlock(&domlist_read_lock);
> out:
> spin_unlock(&cpupool_lock);
> cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index eac8804..a1840c9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -652,6 +652,12 @@ int cpu_disable_scheduler(unsigned int cpu)
> if ( c == NULL )
> return ret;
>
> + /*
> + * We'd need the domain RCU lock, but:
> + * - when we are called from cpupool code, it's acquired there already;
> + * - when we are called for CPU teardown, we're in stop-machine context,
> + * so that's not be a problem.
> + */
> for_each_domain_in_cpupool ( d, c )
> {
> for_each_vcpu ( d, v )
> @@ -741,7 +747,6 @@ int cpu_disable_scheduler(unsigned int cpu)
> ret = -EAGAIN;
> }
> }
> - domain_update_node_affinity(d);
> }
>
> return ret;
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |