[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer
On Tue, Sep 20, 2022 at 04:30:41PM +0200, Jan Beulich wrote: > On 20.09.2022 12:22, Marek Marczykowski-Górecki wrote: > > On Mon, Aug 22, 2022 at 12:00:27PM +0200, Marek Marczykowski-Górecki wrote: > >> On Mon, Aug 22, 2022 at 11:53:50AM +0200, Jan Beulich wrote: > >>> On 21.08.2022 18:14, Marek Marczykowski-Górecki wrote: > >>>> On Sat, Oct 09, 2021 at 06:28:17PM +0200, Marek Marczykowski-Górecki > >>>> wrote: > >>>>> On Sun, Jan 31, 2021 at 03:15:30AM +0100, Marek Marczykowski-Górecki > >>>>> wrote: > >>>>>> I'm resurrecting this thread as it was recently mentioned elsewhere. I > >>>>>> can still reproduce the issue on the recent staging branch > >>>>>> (9dc687f155). > >>>>>> > >>>>>> It fails after the first resume (not always, but frequent enough to > >>>>>> debug it). At least one guest needs to be running - with just (PV) dom0 > >>>>>> the crash doesn't happen (at least for the ~8 times in a row I tried). > >>>>>> If the first resume works, the second (almost?) always will fail but > >>>>>> with a different symptoms - dom0 kernel lockups (at least some of its > >>>>>> vcpus). I haven't debugged this one yet at all. > >>>>>> > >>>>>> Any help will be appreciated, I can apply some debug patches, change > >>>>>> configuration etc. > >>>>> > >>>>> This still happens on 4.14.3. Maybe it is related to freeing percpu > >>>>> areas, as it caused other issues with suspend too? Just a thought... > >>>> > >>>> I have reproduced this on current staging(*). And I can reproduce it > >>>> reliably. And also, I got (I believe) closely related crash with credit1 > >>>> scheduler. > >>>> > >>>> (*) It isn't plain staging, it's one with my xhci console patches on > >>>> top, including attempt to make it survive S3. I believe the only > >>>> relevant part there is sticking set_timer() into console resume path (or > >>>> just having a timer with rather short delay registered). The actual tree > >>>> at https://github.com/marmarek/xen/tree/master-xue2-debug, including > >>>> quite a lot of debug prints and debug hacks. > >>>> > >>>> Specific crash with credit2: > >> > >> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at > >> common/sched/credit2.c:2274 > >> (XEN) ----[ Xen-4.17-unstable x86_64 debug=y Tainted: C ]---- > >> (XEN) CPU: 10 > >> (XEN) RIP: e008:[<ffff82d040247a4d>] > >> credit2.c#csched2_unit_wake+0x152/0x154 > >> (XEN) RFLAGS: 0000000000010083 CONTEXT: hypervisor (d0v0) > >> (XEN) rax: ffff830251778230 rbx: ffff830251768cb0 rcx: 00000032111d6000 > >> (XEN) rdx: ffff8302515c1eb0 rsi: 0000000000000006 rdi: ffff830251769000 > >> (XEN) rbp: ffff8302515cfd90 rsp: ffff8302515cfd70 r8: ffff830251769000 > >> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > >> (XEN) r12: ffff830251768dd0 r13: ffff8302515c1d00 r14: 0000000000000006 > >> (XEN) r15: ffff82d0405ddb40 cr0: 0000000080050033 cr4: 0000000000372660 > >> (XEN) cr3: 000000022f2a1000 cr2: ffff8881012738e0 > >> (XEN) fsb: 0000744bf6a0db80 gsb: ffff888255600000 gss: 0000000000000000 > >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > >> (XEN) Xen code around <ffff82d040247a4d> > >> (credit2.c#csched2_unit_wake+0x152/0x154): > >> (XEN) df e8 6f bf ff ff eb ad <0f> 0b f3 0f 1e fa 55 48 89 e5 41 57 41 56 > >> 41 55 > >> (XEN) Xen stack trace from rsp=ffff8302515cfd70: > >> (XEN) ffff83025174b000 ffff830251768cb0 ffff830251778270 > >> ffff82d0405c4298 > >> (XEN) ffff8302515cfdd8 ffff82d04024fcb8 0000000000000202 > >> ffff830251778270 > >> (XEN) ffff83025174b000 0000000000000001 ffff830251769018 > >> 0000000000000000 > >> (XEN) 0000000000000000 ffff8302515cfe48 ffff82d04020a8c9 > >> ffff8882556aedc0 > >> (XEN) 0000000000000003 00001910537e623e 0000000b988f78a6 > >> 0000000059d4a716 > >> (XEN) 00001901f30fa41e 0000000217f96af6 0000000000000000 > >> ffff83025174b000 > >> (XEN) ffff830251756000 0000000000000002 0000000000000001 > >> ffff8302515cfe70 > >> (XEN) ffff82d0402f7968 ffff830251756000 ffff8302515cfef8 > >> 0000000000000018 > >> (XEN) ffff8302515cfee8 ffff82d0402ec6de 0000000000000000 > >> ffffffff82f157e0 > >> (XEN) 0000000000000000 0000000000000000 ffff8302515cfef8 > >> 0000000000000000 > >> (XEN) 0000000000000000 ffff8302515cffff ffff830251756000 > >> 0000000000000000 > >> (XEN) 0000000000000000 0000000000000000 0000000000000000 > >> 00007cfdaea300e7 > >> (XEN) ffff82d0402012bd 0000000000000000 ffffffff82c51120 > >> ffff88810036cf00 > >> (XEN) 0000000000000002 000000000001e120 0000000000000002 > >> 0000000000000246 > >> (XEN) ffffffff82f157e0 0000000000000001 0000000000000000 > >> 0000000000000018 > >> (XEN) ffffffff81e4a30a 0000000000000000 0000000000000002 > >> 0000000000000001 > >> (XEN) 0000010000000000 ffffffff81e4a30a 000000000000e033 > >> 0000000000000246 > >> (XEN) ffffc9004aef7c18 000000000000e02b fb5ee398d214b10c > >> eb5ef398c214a10c > >> (XEN) eb56f390c21ca104 ebd6f310c29ca184 0000e0100000000a > >> ffff830251756000 > >> (XEN) 0000003211016000 0000000000372660 0000000000000000 > >> 80000002963e1002 > >> (XEN) Xen call trace: > >> (XEN) [<ffff82d040247a4d>] R credit2.c#csched2_unit_wake+0x152/0x154 > >> (XEN) [<ffff82d04024fcb8>] F vcpu_wake+0xfd/0x267 > >> (XEN) [<ffff82d04020a8c9>] F common_vcpu_op+0x178/0x5d1 > >> (XEN) [<ffff82d0402f7968>] F do_vcpu_op+0x69/0x226 > >> (XEN) [<ffff82d0402ec6de>] F pv_hypercall+0x575/0x657 > >> (XEN) [<ffff82d0402012bd>] F lstar_enter+0x13d/0x150 > >> (XEN) > >> (XEN) > >> (XEN) **************************************** > >> (XEN) Panic on CPU 10: > >> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at > >> common/sched/credit2.c:2274 > >> (XEN) **************************************** > > > > Ok, I think I figured it out! > > > > I added a function that verifies run queues of all the sched units, and > > called it basically every other line on the resume path. The debug > > function (if anybody is interested): > > > > void verify_sched_units(void) > > { > > struct domain *d; > > const struct sched_unit *unit; > > > > for_each_domain ( d ) > > { > > for_each_sched_unit ( d, unit ) > > { > > if ( c2rqd(sched_unit_master(unit)) != > > csched2_unit(unit)->rqd ) > > { > > printk(XENLOG_WARNING "d%d sched unit %d: rq=%d, unit > > master %d, rq=%d\n", > > d->domain_id, unit->unit_id, > > csched2_unit(unit)->rqd ? > > csched2_unit(unit)->rqd->id : -1, > > sched_unit_master(unit), > > c2rqd(sched_unit_master(unit))->id); > > WARN_ON(1); > > } > > } > > } > > } > > > > It appears that restore_vcpu_affinity() is responsible, specifically > > this part: > > > > 1216 /* > > 1217 * Re-assign the initial processor as after resume we have no > > 1218 * guarantee the old processor has come back to life again. > > 1219 * > > 1220 * Therefore, here, before actually unpausing the domains, we > > should > > 1221 * set v->processor of each of their vCPUs to something that > > will > > 1222 * make sense for the scheduler of the cpupool in which they > > are in. > > 1223 */ > > ... > > 1249 res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu))); > > 1250 sched_set_res(unit, res); > > 1251 > > 1252 spin_unlock_irq(lock); > > 1253 > > 1254 /* v->processor might have changed, so reacquire the lock. */ > > 1255 lock = unit_schedule_lock_irq(unit); > > 1256 res = sched_pick_resource(unit_scheduler(unit), unit); > > 1257 sched_set_res(unit, res); > > 1258 spin_unlock_irq(lock); > > 1259 > > 1260 if ( old_cpu != sched_unit_master(unit) ) > > 1261 sched_move_irqs(unit); > > > > It calls sched_set_res() directly, which assigns sched resources, but > > does _not_ adjust runqueues (if new pcpu happen to be assigned to > > another runqueue than the one from previous pcpu). > > > > I have two (non exclusive) ideas here: > > 1. If old_cpu is actually still available, do not move it at all. > > 2. Use sched_migrate() instead of sched_set_res(). > > > > Here is the patch that fixes it for me: > > ---8<--- > > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c > > index 83455fbde1c8..dcf202d8b307 100644 > > --- a/xen/common/sched/core.c > > +++ b/xen/common/sched/core.c > > @@ -1246,19 +1246,29 @@ void restore_vcpu_affinity(struct domain *d) > > } > > } > > > > - res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu))); > > + /* Prefer old cpu if available. */ > > + if ( cpumask_test_cpu(old_cpu, cpumask_scratch_cpu(cpu)) ) > > + res = get_sched_res(old_cpu); > > + else > > + res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu))); > > sched_set_res(unit, res); > > > > spin_unlock_irq(lock); > > > > - /* v->processor might have changed, so reacquire the lock. */ > > - lock = unit_schedule_lock_irq(unit); > > - res = sched_pick_resource(unit_scheduler(unit), unit); > > - sched_set_res(unit, res); > > - spin_unlock_irq(lock); > > - > > + /* > > + * If different cpu was chosen, it was random, let scheduler do > > proper > > + * decision. > > + */ > > if ( old_cpu != sched_unit_master(unit) ) > > + { > > + /* v->processor might have changed, so reacquire the lock. */ > > + lock = unit_schedule_lock_irq(unit); > > + res = sched_pick_resource(unit_scheduler(unit), unit); > > + sched_migrate(unit_scheduler(unit), unit, res->master_cpu); > > + spin_unlock_irq(lock); > > + > > sched_move_irqs(unit); > > + } > > } > > > > rcu_read_unlock(&sched_res_rculock); > > ---8<--- > > > > I have several doubts here: > > > > 1. If old_cpu is available, is sched_set_res() needed at all? > > 2. Should both calls be changed to sched_migrate()? Currently I changed > > only the second one, in case scheduler could be confused about > > old_cpu not being available anymore. > > 3. Are there any extra locking requirements for sched_migrate() at this > > stage? The long comment above sched_unit_migrate_start() suggests > > there might be, but I'm not sure if that's really the case during > > resume. > > 4. Related to the above - should thaw_domains() be modified to call > > restore_vcpu_affinity() for all domains first, and unpause only > > later? That could reduce locking requirements, I guess. > > All questions primarily to the scheduler maintainers - forwarding > accordingly. Ping? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |