[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic


  • To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, x86@xxxxxxxxxx
  • From: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
  • Date: Fri, 25 Feb 2022 17:17:41 -0800
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RhEiP+/Ax28z9eZZ/auAHLTVRUyLy3kmKT/2fDdzMDQ=; b=an2sERoeq5TUkECbBmT97dLW3DHobz3Qx/q8uRQR05UaNiIwgZaZM2EXpJiPRRBIxOWZ2GLrujkwwmYm+hXPDvGfGKw7RHB8WQCY4c+GX/0dtsC0wBDisMCfr9n/xSxpPy7db5Lg5bvXMJmD9/ttK9M7ov/bfdZKa891pQx3P4qG1g5G30+oTlkTybhps1IJyLwByzogz3gU6DjgvSdzwX6BeyMQNVsPIuNJ0ul9h2PnqXAvqi9rx+VsKHKF/YBvyXjhzJheZv4cqORaxah7rTBXJvsuE+cOtBGKs0hy12aYdVmAsCRDH4X4DIw5aAYOWteeFfUou444y/9a9AvMMQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dcfZGPma64bbSGUfeJDod6NxIvFWiCQufrLc78Dq4u6+p0A2g+Dwr1iDPpKlXGgRdzmlc6RYaTBGXpT3nZGKsrWKnNa1/eZGlMPrlvMP2o42XK9vCThielhAPILi5x1Zvvx4KGndH9tZxB/VTLZhzVdWaCKSUPOiVZZl+L236+7bidQV4nLDSg0HvDuCSsdTiFyUMnp4FI5OeFuXn+kh+emutJ/4djEmUCUru/WsJSVZx9Skl1d0fd5FZYqEH9TfebBdSRGHMI9Nw+d5K6cOEEDfy1NkZbDdGFkxhxOQ+MrH132CMuoG+4OQF9kLPCjonN/q+RfSgsKmtJ6koETLeQ==
  • Cc: linux-kernel@xxxxxxxxxxxxxxx, jgross@xxxxxxxx, sstabellini@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, mingo@xxxxxxxxxx, bp@xxxxxxxxx, dave.hansen@xxxxxxxxxxxxxxx, hpa@xxxxxxxxx, joe.jin@xxxxxxxxxx
  • Delivery-date: Sat, 26 Feb 2022 01:18:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Boris,

On 2/25/22 2:39 PM, Boris Ostrovsky wrote:
> 
> On 2/24/22 4:50 PM, Dongli Zhang wrote:
>> This is the v3 of the patch to fix xen kexec kernel panic issue when the
>> kexec is triggered on VCPU >= 32.
>>
>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
>> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>> ... ...
>> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX:
>> 0000000000000000
>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 
>> 0000000000000002
>> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 
>> 0000000000000020
>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 
>> 0000000000000001
>> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 
>> 0000000040000004
>> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 
>> 0000000000000000
>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000)
>> knlGS:0000000000000000
>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 
>> 00000000000406a0
>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [    0.000000] Call Trace:
>> [    0.000000]  <TASK>
>> [    0.000000]  ? xen_clocksource_read+0x24/0x40
> 
> 
> This is done to set xen_sched_clock_offset which I think will not be used for 
> a
> while, until sched_clock is called (and the other two uses are for 
> suspend/resume)
> 
> 
> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until
> after all vcpu areas are properly set? Or are there other uses of
> xen_clocksource_read() before ?
> 

I have tested that below patch will panic kdump kernel.

diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
index 6ff3c887e0b9..6a0c99941ae1 100644
--- a/arch/x86/xen/smp_hvm.c
+++ b/arch/x86/xen/smp_hvm.c
@@ -19,6 +19,8 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
         */
        xen_vcpu_setup(0);

+       xen_init_sched_clock_offset();
+
        /*
         * The alternative logic (which patches the unlock/lock) runs before
         * the smp bootup up code is activated. Hence we need to set this up
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index d9c945ee1100..8a2eafa0c215 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -520,9 +520,14 @@ static void __init xen_time_init(void)
                pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier);
 }

-static void __init xen_init_time_common(void)
+void xen_init_sched_clock_offset(void)
 {
        xen_sched_clock_offset = xen_clocksource_read();
+}
+
+static void __init xen_init_time_common(void)
+{
+       xen_sched_clock_offset = 0;
        static_call_update(pv_steal_clock, xen_steal_clock);
        paravirt_set_sched_clock(xen_sched_clock);

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index fd0fec6e92f4..9f7656214dfb 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -69,6 +69,7 @@ void xen_teardown_timer(int cpu);
 void xen_setup_cpu_clockevents(void);
 void xen_save_time_memory_area(void);
 void xen_restore_time_memory_area(void);
+void xen_init_sched_clock_offset(void);
 void xen_init_time_ops(void);
 void xen_hvm_init_time_ops(void);



Unfortunately, I am not able to obtain the panic callstack from kdump time this
time. I have only below.

PANIC: early exception 0x0e IP 10:ffffffffa6c679b6 error 0 cr2 0x20
PANIC: early exception 0x0e IP 10:ffffffffa6c679b6 error 0 cr2 0x20



The sched_clock() can be used very early since commit 857baa87b642
("sched/clock: Enable sched clock early"). Any printk should use sched_clock()
to obtain the timestamp.

vprintk_store()
-> local_clock()
   -> sched_clock()
      -> paravirt_sched_clock()
         -> xen_sched_clock()
            -> xen_clocksource_read()


AFAIR, we started to encounter the issue since commit 857baa87b642
("sched/clock: Enable sched clock early"). kdump used to work well before that
commit.

Thank you very much!

Dongli Zhang



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.