[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-shim 4.13 assertion failures during vcpu_wake()



On 22.10.19 12:52, Roger Pau Monné wrote:
On Tue, Oct 22, 2019 at 11:27:41AM +0200, Jürgen Groß wrote:
On 21.10.19 11:51, Sergey Dyasli wrote:
Hello,

While testing pv-shim from a snapshot of staging 4.13 branch (with core-
scheduling patches applied), some sort of scheduling issues were uncovered
which usually leads to a guest lockup (sometimes with soft lockup messages
from Linux kernel).

This happens more frequently on SandyBridge CPUs. After enabling
CONFIG_DEBUG in pv-shim, the following assertions failed:

Null scheduler:

      Assertion 'lock == get_sched_res(i->res->master_cpu)->schedule_lock' 
failed at ...are/xen-dir/xen-root/xen/include/xen/sched-if.h:278
      (full crash log: https://paste.debian.net/1108861/ )

Credit1 scheduler:

      Assertion 'cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu' failed at 
sched_credit.c:383
      (full crash log: https://paste.debian.net/1108862/ )

I'm currently investigation those, but would appreciate any help or
suggestions.

And now a more sane patch to try.


Juergen


 From 205b7622b84bc678f8a0d6ac121dff14439fe331 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@xxxxxxxx>
To: xen-devel@xxxxxxxxxxxxxxxxxxxx
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: Wei Liu <wl@xxxxxxx>
Cc: "Roger Pau Monné" <roger.pau@xxxxxxxxxx>
Date: Tue, 22 Oct 2019 11:14:08 +0200
Subject: [PATCH] xen/pvhsim: fix cpu onlining

Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select processor")
the initial processor for all pv-shim vcpus will be 0, as no other cpus
are online when the vcpus are created. Before that commit the vcpus
would have processors set not being online yet, which worked just by
chance.

When the pv-shim vcpu becomes active it will have a hard affinity
not matching its initial processor assignment leading to failing
ASSERT()s or other problems depending on the selected scheduler.

I'm slightly lost here, who has set this hard affinity on the pvshim
vCPUs?

That is done in sched_setup_dom0_vcpus().


Fix that by redoing the affinity setting after onlining the cpu but
before taking the vcpu up.

The change seems fine to me, but I don't understand why the lack of
this can cause asserts to trigger, as reported by Sergey. I also
wonder why a change to pin vCPU#0 to pCPU#0 is not required, because
pv_shim_cpu_up is only used for APs.

When vcpu 0 is being created pcpu 0 is online already. So the affinity
set in sched_setup_dom0_vcpus() is fine in that case.

I would expect that pvshim guest vCPUs have no hard affinity ATM, and
that when a pCPU (from the shim PoV) is brought online it will be
added to the pool of available pCPU for the shim to schedule vCPUs
on.

That expectation is wrong. All vcpus are pinned to their respective
pcpus.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.