[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-shim 4.13 assertion failures during vcpu_wake()



On Tue, Oct 22, 2019 at 11:27:41AM +0200, Jürgen Groß wrote:
> On 21.10.19 11:51, Sergey Dyasli wrote:
> > Hello,
> > 
> > While testing pv-shim from a snapshot of staging 4.13 branch (with core-
> > scheduling patches applied), some sort of scheduling issues were uncovered
> > which usually leads to a guest lockup (sometimes with soft lockup messages
> > from Linux kernel).
> > 
> > This happens more frequently on SandyBridge CPUs. After enabling
> > CONFIG_DEBUG in pv-shim, the following assertions failed:
> > 
> > Null scheduler:
> > 
> >      Assertion 'lock == get_sched_res(i->res->master_cpu)->schedule_lock' 
> > failed at ...are/xen-dir/xen-root/xen/include/xen/sched-if.h:278
> >      (full crash log: https://paste.debian.net/1108861/ )
> > 
> > Credit1 scheduler:
> > 
> >      Assertion 'cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu' failed 
> > at sched_credit.c:383
> >      (full crash log: https://paste.debian.net/1108862/ )
> > 
> > I'm currently investigation those, but would appreciate any help or
> > suggestions.
> 
> And now a more sane patch to try.
> 
> 
> Juergen
> 

> From 205b7622b84bc678f8a0d6ac121dff14439fe331 Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@xxxxxxxx>
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Cc: Wei Liu <wl@xxxxxxx>
> Cc: "Roger Pau Monné" <roger.pau@xxxxxxxxxx>
> Date: Tue, 22 Oct 2019 11:14:08 +0200
> Subject: [PATCH] xen/pvhsim: fix cpu onlining
> 
> Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select processor")
> the initial processor for all pv-shim vcpus will be 0, as no other cpus
> are online when the vcpus are created. Before that commit the vcpus
> would have processors set not being online yet, which worked just by
> chance.
> 
> When the pv-shim vcpu becomes active it will have a hard affinity
> not matching its initial processor assignment leading to failing
> ASSERT()s or other problems depending on the selected scheduler.

I'm slightly lost here, who has set this hard affinity on the pvshim
vCPUs?

> Fix that by redoing the affinity setting after onlining the cpu but
> before taking the vcpu up.

The change seems fine to me, but I don't understand why the lack of
this can cause asserts to trigger, as reported by Sergey. I also
wonder why a change to pin vCPU#0 to pCPU#0 is not required, because
pv_shim_cpu_up is only used for APs.

I would expect that pvshim guest vCPUs have no hard affinity ATM, and
that when a pCPU (from the shim PoV) is brought online it will be
added to the pool of available pCPU for the shim to schedule vCPUs
on.

> Fixes: 8d3c326f6756d1 ("xen: let vcpu_create() select processor")
> Reported-by: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> ---
>  xen/arch/x86/pv/shim.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
> index 5edbcd9ac5..4329eaaefe 100644
> --- a/xen/arch/x86/pv/shim.c
> +++ b/xen/arch/x86/pv/shim.c
> @@ -837,6 +837,8 @@ long pv_shim_cpu_up(void *data)
>                      v->vcpu_id, rc);
>              return rc;
>          }
> +
> +        vcpu_set_hard_affinity(v, cpumask_of(v->vcpu_id));
>      }
>  
>      wake = test_and_clear_bit(_VPF_down, &v->pause_flags);
> -- 
> 2.16.4
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.