Xen project Mailing List

Re: [Xen-devel] [PATCH 2/2] xen/sched: don't disable scheduler on cpus during suspend

To: Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

From: Dario Faggioli <dfaggioli@xxxxxxxx>

Date: Sat, 16 Mar 2019 04:01:28 +0100

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Delivery-date: Sat, 16 Mar 2019 03:02:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, 2019-03-14 at 10:59 +0100, Juergen Gross wrote: > Today there is special handling in cpu_disable_scheduler() for > suspend > by forcing all vcpus to the boot cpu. In fact there is no need for > that > as during resume the vcpus are put on the correct cpus again. > So, basically, you're saying that moving the vcpus to BSP / CPU0, before suspending is not necessary, right? I thought it was, because going into suspend involves things like cpu_schedule_down(), which does, among other things, SCHED_OP(free_pdata) for freeing the scheduler's data structs, and hence we did not want vcpus "around". > So we can just omit the call of cpu_disable_scheduler() when > offlining > a cpu due to suspend and on resuming we can omit taking the schedule > lock for selecting the new processor. > Well, in theory, we should hold the lock when for using 'cpumask_scratch_cpu(cpu)'. Not sure this is an actual problem in this case, but still... > In restore_vcpu_affinity() we should be careful when applying > affinity > as the cpu might not have come back to life. > But restore_vcpu_affinity() is done, in a loop, for all vcpus of all domains, in thaw_domains(), which in turn comes after enable_nonboot_cpus(), which I think is supposed to bring every CPU back up.... What am I missing? > diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c > index e89bb67e71..7b5ce18426 100644 > --- a/xen/common/cpupool.c > +++ b/xen/common/cpupool.c > @@ -313,7 +313,7 @@ static long cpupool_unassign_cpu_helper(void > *info) > * cpu_disable_scheduler(), and at the bottom of this function. > */ > rcu_read_lock(&domlist_read_lock); > - ret = cpu_disable_scheduler(cpu); > + ret = (system_state == SYS_STATE_suspend) ? 0 : > cpu_disable_scheduler(cpu); > Mmm... How can this function be called with state = STATE_suspend ? > @@ -735,31 +708,36 @@ void restore_vcpu_affinity(struct domain *d) > > ASSERT(!vcpu_runnable(v)); > > - lock = vcpu_schedule_lock_irq(v); > - > - if ( v->affinity_broken ) > - { > - sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL); > - v->affinity_broken = 0; > - > - } > - > /* > - * During suspend (in cpu_disable_scheduler()), we moved > every vCPU > - * to BSP (which, as of now, is pCPU 0), as a temporary > measure to > - * allow the nonboot processors to have their data structure > freed > - * and go to sleep. But nothing guardantees that the BSP is > a valid > - * pCPU for a particular domain. > + * Re-assign the initial processor as after resume we have > no > + * guarantee the old processor has come back to life again. > * > * Therefore, here, before actually unpausing the domains, > we should > * set v->processor of each of their vCPUs to something that > will > * make sense for the scheduler of the cpupool in which they > are in. > */ > cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, > - cpupool_domain_cpumask(v->domain)); > - v->processor = cpumask_any(cpumask_scratch_cpu(cpu)); > + cpupool_domain_cpumask(d)); > + if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) > + { > + if ( v->affinity_broken ) > + { > + sched_set_affinity(v, v->cpu_hard_affinity_saved, > NULL); > + v->affinity_broken = 0; > + cpumask_and(cpumask_scratch_cpu(cpu), v- > >cpu_hard_affinity, > + cpupool_domain_cpumask(d)); > + } > I'm not sure I follow this; I'll think deeper, but, in the meanwhile... > - spin_unlock_irq(lock); > + if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) > + { > + printk(XENLOG_DEBUG "Breaking affinity for %pv\n", > v); > + sched_set_affinity(v, &cpumask_all, NULL); > + cpumask_and(cpumask_scratch_cpu(cpu), v- > >cpu_hard_affinity, > + cpupool_domain_cpumask(d)); > ... what does it mean we're breaking the affinity while trying to restore it? When/where is this vcpu getting its real affinity back in place then? It might well be me, but I feel like there is something I'm missing about how this patch is supposed to work. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.