[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
  • Date: Tue, 12 Oct 2021 00:24:26 -0700
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=O7D6Lx0SGGeULyAuzI6QSTKRXt1l8pLOBAU8iEv1O+E=; b=n4KL1yIpa1AuF1gcxpXbISU96G43iAJ900KsNnKm1OLYVbWZWcivkjmvgVko13OTwNanFC7EMhlYZXmxKz3wJh6iIVxM7yvoC/FPiB58a0Ozd91W6g98MqYT9bAp/478RvmGR2pD3sfeHOMpLvDZ92bRvcRXgLyUX4KkDw5OjrrBSlzjInQH/6lWTHC2/XwtYuh30DDL5cqTBAi/nc9WmNuFl/x3HeaVhJ4bhLafBNDEYm+c+OGMlDDXWRvfbbcQE9feEBrafqkfC6Y/ahDRSVSDuzs2kjm9ARZ9rais+CcgUO80jh4JEReL2NtFMtXmS+w31c0NH30t01Qd/m55Jg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=k9D2aAH5bsy8IrFQuiwEiSON7gR86ONrFuytar1wzDMWwOAz+zpOvMXtkpPb8PJLoWdgaVLog0rhMYNZiHfSLOb0Al1iMp6SgyeSTFApdHwi/hfMr4bhYNgqxVKXM0wdqKaF6cHowcT2csIce7a8U2YPJzIln+wbckuFS+VkoQlMF77HSS0OoHU1rc4ak5ZleH0R9F0FMxyu0A6Z1AqLFqlvPnxM41cWlzyx0HxxHAc5m9MvhQvNJ0LWxrPkpvmw191wgsJA6iJRv/Zb2i+cdmUjUIWKTS0JaVHkDdGsfei5mUh/XNcDkBrb5WDZmf32ncCXbo9Ulhn9bIFq08bH1A==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=oracle.com;
  • Cc: linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, boris.ostrovsky@xxxxxxxxxx, jgross@xxxxxxxx, sstabellini@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, mingo@xxxxxxxxxx, bp@xxxxxxxxx, hpa@xxxxxxxxx, andrew.cooper3@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, iwj@xxxxxxxxxxxxxx, jbeulich@xxxxxxxx, julien@xxxxxxx, wl@xxxxxxx, joe.jin@xxxxxxxxxx
  • Delivery-date: Tue, 12 Oct 2021 07:25:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap
to xen side with reason=soft_reset. As a result, the xen will reboot the VM
with the kdump kernel.

Unfortunately, when the VM is panic with below command line ...

"taskset -c 33 echo c > /proc/sysrq-trigger"

... the kdump kernel is panic at early stage ...

PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1
[    0.000000] Hardware name: Xen HVM domU
[    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
... ...
[    0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX: 
0000000000000000
[    0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff
[    0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020
[    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
[    0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004
[    0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffaa95e000(0000) 
knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.000000] Call Trace:
[    0.000000]  ? xen_init_time_common+0x11/0x55
[    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
[    0.000000]  ? xen_hvm_guest_init+0x214/0x251
[    0.000000]  ? 0xffffffffa8c00000
[    0.000000]  ? setup_arch+0x440/0xbd6
[    0.000000]  ? start_kernel+0x6a/0x689
[    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb

This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.


The 1st patch is to fix the issue at VM kernel side. However, we may
observe clock drift at VM side due to the issue at xen hypervisor side.
This is because the pv vcpu_time_info is not updated when
VCPUOP_register_vcpu_info.

The 2nd patch is to force_update_vcpu_system_time() at xen side when
VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel
boot.


I did test the fix by backporting the 2nd patch to a prior old xen version.
This is because I am not able to use soft_reset successfully with mainline
xen. I have encountered below error when testing soft_reset with mainline
xen. Please let me know if there is any know issue/solution.

# xl -v create -F vm.cfg
... ...
... ...
Domain 1 has shut down, reason code 5 0x5
Action for shutdown reason code 5 is soft-reset
Done. Rebooting now
xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, 
msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:488:libxl__cpuid_legacy: Domain 1:Failed to apply 
CPUID policy: File exists
libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 1:cannot 
(re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read 
failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:53:libxl__domain_type: unable to get domain type 
for domid=1, assuming HVM


Thank you very much!

Dongli Zhang





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.