[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-shim 4.13 assertion failures during vcpu_wake()



On Tue, Oct 22, 2019 at 01:50:44PM +0200, Jürgen Groß wrote:
> On 22.10.19 13:25, Roger Pau Monné wrote:
> > On Tue, Oct 22, 2019 at 01:01:09PM +0200, Jürgen Groß wrote:
> > > On 22.10.19 12:52, Roger Pau Monné wrote:
> > > > On Tue, Oct 22, 2019 at 11:27:41AM +0200, Jürgen Groß wrote:
> > > > > Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select 
> > > > > processor")
> > > > > the initial processor for all pv-shim vcpus will be 0, as no other 
> > > > > cpus
> > > > > are online when the vcpus are created. Before that commit the vcpus
> > > > > would have processors set not being online yet, which worked just by
> > > > > chance.
> > 
> > So all vCPUs for the shim have their hard affinity set to pCPU#0 if I
> 
> No, the hard affinity is set to pcpu#(vcpu-id), but the initial cpu to
> run on is pcpu#0 as no other cpu is online when the vcpus are being
> created, and v->processor should always be a valid online cpu.

Oh, I didn't know v->processor must always be valid, even for offline
vCPUs. I'm quite sure the shim previously set v->processor to pCPUs
that where not yet online.

> > understand it correctly. From my reading of sched_setup_dom0_vcpus it
> > seems like in the shim case all sched units are pinned to their id,
> > which would imply sched units != 0 are not pinned to CPU#0?
> 
> Right.
> 
> > 
> > Or maybe there's only one sched unit that contains all the shim vCPUs?
> 
> No.
> 
> > 
> > > > > When the pv-shim vcpu becomes active it will have a hard affinity
> > > > > not matching its initial processor assignment leading to failing
> > > > > ASSERT()s or other problems depending on the selected scheduler.
> > > > 
> > > > I'm slightly lost here, who has set this hard affinity on the pvshim
> > > > vCPUs?
> > > 
> > > That is done in sched_setup_dom0_vcpus().
> > > 
> > > > 
> > > > > Fix that by redoing the affinity setting after onlining the cpu but
> > > > > before taking the vcpu up.
> > > > 
> > > > The change seems fine to me, but I don't understand why the lack of
> > > > this can cause asserts to trigger, as reported by Sergey. I also
> > > > wonder why a change to pin vCPU#0 to pCPU#0 is not required, because
> > > > pv_shim_cpu_up is only used for APs.
> > > 
> > > When vcpu 0 is being created pcpu 0 is online already. So the affinity
> > > set in sched_setup_dom0_vcpus() is fine in that case.
> > 
> > IIRC all shim vCPUs where pinned to their identity pCPU at creation, and
> > there was no need to do this pining when the vCPU is brought online. I
> > guess this is no longer possible.
> 
> The problem is not the pinning, but the initial cpu stored in
> v->processor. This results in v->processor not being set in the hard
> affinity mask of the vcpu (or better: unit) which then triggers the
> problems.

I guess just setting v->processor in pv_shim_cpu_up directly would be
too intrusive?

In any case, it seems dangerous to allow vCPUs (even when offline) to
be in a state that when woken up will cause assertions inside the
scheduling logic. Ie: it would be best IMO to not set the hard
affinity in sched_setup_dom0_vcpus and instead set it when the pCPU is
brought online, or maybe have vcpu_wake select a suitable v->processor
value?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.