[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 20 Sep 2022 16:30:41 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hUWtHTQMBOfiSK/tDeGOWwDkYur51NulFMNRtcE1OJ4=; b=PaCQFiG/SNJYKAqcIp6JOQs1MJP31jTM0bDlpWo3am2tYKb6BCYm1F3VhRuBW3fmJyndJ9gwo5BaNvd7EEbbEairFjQzJ+vCbSpcWWDzmOqIrS1AjKZFJDP0WIeMyvOv4otFDqEF/Zrx+ghNBjq2mSnBGBt/AzetZj5qrpzz2IZ9955ZM0wkWgO9BG8KhtAEPVvV/x6JJSfVbbfyYOk+pgtlMRamGApbr8+fbDYTy2tiD/ISTBgsr+CafebEk4Oet1ff5+OslxSJsU+Sc9ftzReG3jDiaN6ENBvQjBiqSkkyOxxg/wXrAnN1YWOC4hujB8LxUALupBG+ffGpBbDfpQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GxrtoG64rhNzTIKKV6O8lDocGmXcPvSRInrh8zWB6Be+6cj0BDVYpgtm4mdzju0n+H4V4kfnuvzT4FZ1EHo0FSCNlZvkqAliUt96oEhjDWX9YKl0rWZg+hp2ZxmHeYtV6wiFrwLXrI5NawHGBzS8HLzcaD5bPKnie2/X+01z7y373eK0dv6BZenQEV3fWAAaah3ZKMxjHgXs+P5IVOgx2OqWD1q8/oG19LXClbJgtlwiO9rQfHypiuLr9hSpRgXpbqInrOFukT+vusuGxldXfGtGXnfB+TR2QOpvuoc3bECXAWQ3QQpFGC/G12uTX8bmDpPPvyIDiSg+AL05KYeruQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jürgen Groß <jgross@xxxxxxxx>
  • Delivery-date: Tue, 20 Sep 2022 14:30:49 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.09.2022 12:22, Marek Marczykowski-Górecki wrote:
> On Mon, Aug 22, 2022 at 12:00:27PM +0200, Marek Marczykowski-Górecki wrote:
>> On Mon, Aug 22, 2022 at 11:53:50AM +0200, Jan Beulich wrote:
>>> On 21.08.2022 18:14, Marek Marczykowski-Górecki wrote:
>>>> On Sat, Oct 09, 2021 at 06:28:17PM +0200, Marek Marczykowski-Górecki wrote:
>>>>> On Sun, Jan 31, 2021 at 03:15:30AM +0100, Marek Marczykowski-Górecki 
>>>>> wrote:
>>>>>> I'm resurrecting this thread as it was recently mentioned elsewhere. I
>>>>>> can still reproduce the issue on the recent staging branch (9dc687f155).
>>>>>>
>>>>>> It fails after the first resume (not always, but frequent enough to
>>>>>> debug it). At least one guest needs to be running - with just (PV) dom0
>>>>>> the crash doesn't happen (at least for the ~8 times in a row I tried).
>>>>>> If the first resume works, the second (almost?) always will fail but
>>>>>> with a different symptoms - dom0 kernel lockups (at least some of its
>>>>>> vcpus). I haven't debugged this one yet at all.
>>>>>>
>>>>>> Any help will be appreciated, I can apply some debug patches, change
>>>>>> configuration etc.
>>>>>
>>>>> This still happens on 4.14.3. Maybe it is related to freeing percpu
>>>>> areas, as it caused other issues with suspend too? Just a thought...
>>>>
>>>> I have reproduced this on current staging(*). And I can reproduce it
>>>> reliably. And also, I got (I believe) closely related crash with credit1
>>>> scheduler.
>>>>
>>>> (*) It isn't plain staging, it's one with my xhci console patches on
>>>> top, including attempt to make it survive S3. I believe the only
>>>> relevant part there is sticking set_timer() into console resume path (or
>>>> just having a timer with rather short delay registered). The actual tree
>>>> at https://github.com/marmarek/xen/tree/master-xue2-debug, including
>>>> quite a lot of debug prints and debug hacks.
>>>>
>>>> Specific crash with credit2:
>>
>> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at 
>> common/sched/credit2.c:2274
>> (XEN) ----[ Xen-4.17-unstable  x86_64  debug=y  Tainted:   C    ]----
>> (XEN) CPU:    10
>> (XEN) RIP:    e008:[<ffff82d040247a4d>] 
>> credit2.c#csched2_unit_wake+0x152/0x154
>> (XEN) RFLAGS: 0000000000010083   CONTEXT: hypervisor (d0v0)
>> (XEN) rax: ffff830251778230   rbx: ffff830251768cb0   rcx: 00000032111d6000
>> (XEN) rdx: ffff8302515c1eb0   rsi: 0000000000000006   rdi: ffff830251769000
>> (XEN) rbp: ffff8302515cfd90   rsp: ffff8302515cfd70   r8:  ffff830251769000
>> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
>> (XEN) r12: ffff830251768dd0   r13: ffff8302515c1d00   r14: 0000000000000006
>> (XEN) r15: ffff82d0405ddb40   cr0: 0000000080050033   cr4: 0000000000372660
>> (XEN) cr3: 000000022f2a1000   cr2: ffff8881012738e0
>> (XEN) fsb: 0000744bf6a0db80   gsb: ffff888255600000   gss: 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen code around <ffff82d040247a4d> 
>> (credit2.c#csched2_unit_wake+0x152/0x154):
>> (XEN)  df e8 6f bf ff ff eb ad <0f> 0b f3 0f 1e fa 55 48 89 e5 41 57 41 56 
>> 41 55
>> (XEN) Xen stack trace from rsp=ffff8302515cfd70:
>> (XEN)    ffff83025174b000 ffff830251768cb0 ffff830251778270 ffff82d0405c4298
>> (XEN)    ffff8302515cfdd8 ffff82d04024fcb8 0000000000000202 ffff830251778270
>> (XEN)    ffff83025174b000 0000000000000001 ffff830251769018 0000000000000000
>> (XEN)    0000000000000000 ffff8302515cfe48 ffff82d04020a8c9 ffff8882556aedc0
>> (XEN)    0000000000000003 00001910537e623e 0000000b988f78a6 0000000059d4a716
>> (XEN)    00001901f30fa41e 0000000217f96af6 0000000000000000 ffff83025174b000
>> (XEN)    ffff830251756000 0000000000000002 0000000000000001 ffff8302515cfe70
>> (XEN)    ffff82d0402f7968 ffff830251756000 ffff8302515cfef8 0000000000000018
>> (XEN)    ffff8302515cfee8 ffff82d0402ec6de 0000000000000000 ffffffff82f157e0
>> (XEN)    0000000000000000 0000000000000000 ffff8302515cfef8 0000000000000000
>> (XEN)    0000000000000000 ffff8302515cffff ffff830251756000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 00007cfdaea300e7
>> (XEN)    ffff82d0402012bd 0000000000000000 ffffffff82c51120 ffff88810036cf00
>> (XEN)    0000000000000002 000000000001e120 0000000000000002 0000000000000246
>> (XEN)    ffffffff82f157e0 0000000000000001 0000000000000000 0000000000000018
>> (XEN)    ffffffff81e4a30a 0000000000000000 0000000000000002 0000000000000001
>> (XEN)    0000010000000000 ffffffff81e4a30a 000000000000e033 0000000000000246
>> (XEN)    ffffc9004aef7c18 000000000000e02b fb5ee398d214b10c eb5ef398c214a10c
>> (XEN)    eb56f390c21ca104 ebd6f310c29ca184 0000e0100000000a ffff830251756000
>> (XEN)    0000003211016000 0000000000372660 0000000000000000 80000002963e1002
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d040247a4d>] R credit2.c#csched2_unit_wake+0x152/0x154
>> (XEN)    [<ffff82d04024fcb8>] F vcpu_wake+0xfd/0x267
>> (XEN)    [<ffff82d04020a8c9>] F common_vcpu_op+0x178/0x5d1
>> (XEN)    [<ffff82d0402f7968>] F do_vcpu_op+0x69/0x226
>> (XEN)    [<ffff82d0402ec6de>] F pv_hypercall+0x575/0x657
>> (XEN)    [<ffff82d0402012bd>] F lstar_enter+0x13d/0x150
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 10:
>> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at 
>> common/sched/credit2.c:2274
>> (XEN) ****************************************
> 
> Ok, I think I figured it out!
> 
> I added a function that verifies run queues of all the sched units, and
> called it basically every other line on the resume path. The debug
> function (if anybody is interested):
> 
>     void verify_sched_units(void)
>     {   
>         struct domain *d;
>         const struct sched_unit *unit;
>         
>         for_each_domain ( d )
>         {
>             for_each_sched_unit ( d, unit )
>             {
>                 if ( c2rqd(sched_unit_master(unit)) != 
> csched2_unit(unit)->rqd )
>                 {
>                     printk(XENLOG_WARNING "d%d sched unit %d: rq=%d, unit 
> master %d, rq=%d\n",
>                             d->domain_id, unit->unit_id,
>                             csched2_unit(unit)->rqd ? 
> csched2_unit(unit)->rqd->id : -1,
>                             sched_unit_master(unit),
>                             c2rqd(sched_unit_master(unit))->id);
>                     WARN_ON(1);
>                 }
>             }
>         }
>     }
> 
> It appears that restore_vcpu_affinity() is responsible, specifically
> this part:
> 
> 1216         /*
> 1217          * Re-assign the initial processor as after resume we have no
> 1218          * guarantee the old processor has come back to life again.
> 1219          *
> 1220          * Therefore, here, before actually unpausing the domains, we 
> should
> 1221          * set v->processor of each of their vCPUs to something that will
> 1222          * make sense for the scheduler of the cpupool in which they are 
> in.
> 1223          */
> ...
> 1249         res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
> 1250         sched_set_res(unit, res);
> 1251 
> 1252         spin_unlock_irq(lock);
> 1253 
> 1254         /* v->processor might have changed, so reacquire the lock. */
> 1255         lock = unit_schedule_lock_irq(unit);
> 1256         res = sched_pick_resource(unit_scheduler(unit), unit);
> 1257         sched_set_res(unit, res);
> 1258         spin_unlock_irq(lock);
> 1259 
> 1260         if ( old_cpu != sched_unit_master(unit) )
> 1261             sched_move_irqs(unit);
> 
> It calls sched_set_res() directly, which assigns sched resources, but
> does _not_ adjust runqueues (if new pcpu happen to be assigned to
> another runqueue than the one from previous pcpu).
> 
> I have two (non exclusive) ideas here:
> 1. If old_cpu is actually still available, do not move it at all.
> 2. Use sched_migrate() instead of sched_set_res().
> 
> Here is the patch that fixes it for me:
> ---8<---
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 83455fbde1c8..dcf202d8b307 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -1246,19 +1246,29 @@ void restore_vcpu_affinity(struct domain *d)
>              }
>          }
>  
> -        res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
> +        /* Prefer old cpu if available. */
> +        if ( cpumask_test_cpu(old_cpu, cpumask_scratch_cpu(cpu)) )
> +            res = get_sched_res(old_cpu);
> +        else
> +            res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
>          sched_set_res(unit, res);
>  
>          spin_unlock_irq(lock);
>  
> -        /* v->processor might have changed, so reacquire the lock. */
> -        lock = unit_schedule_lock_irq(unit);
> -        res = sched_pick_resource(unit_scheduler(unit), unit);
> -        sched_set_res(unit, res);
> -        spin_unlock_irq(lock);
> -
> +        /*
> +         * If different cpu was chosen, it was random, let scheduler do 
> proper
> +         * decision.
> +         */
>          if ( old_cpu != sched_unit_master(unit) )
> +        {
> +            /* v->processor might have changed, so reacquire the lock. */
> +            lock = unit_schedule_lock_irq(unit);
> +            res = sched_pick_resource(unit_scheduler(unit), unit);
> +            sched_migrate(unit_scheduler(unit), unit, res->master_cpu);
> +            spin_unlock_irq(lock);
> +
>              sched_move_irqs(unit);
> +        }
>      }
>  
>      rcu_read_unlock(&sched_res_rculock);
> ---8<---
> 
> I have several doubts here:
> 
> 1. If old_cpu is available, is sched_set_res() needed at all?
> 2. Should both calls be changed to sched_migrate()? Currently I changed
>    only the second one, in case scheduler could be confused about
>    old_cpu not being available anymore.
> 3. Are there any extra locking requirements for sched_migrate() at this
>    stage? The long comment above sched_unit_migrate_start() suggests
>    there might be, but I'm not sure if that's really the case during
>    resume.
> 4. Related to the above - should thaw_domains() be modified to call
>    restore_vcpu_affinity() for all domains first, and unpause only
>    later? That could reduce locking requirements, I guess.

All questions primarily to the scheduler maintainers - forwarding
accordingly.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.