[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] xen: add support for automatic debug key actions in case of crash
On 11.12.20 11:49, Julien Grall wrote: On 11/12/2020 10:38, Jürgen Groß wrote:On 11.12.20 11:22, Julien Grall wrote:Hi, On 11/12/2020 07:24, Jan Beulich wrote:On 11.12.2020 08:02, Jürgen Groß wrote:On 10.12.20 21:51, Julien Grall wrote:Hi Jan, On 09/12/2020 14:29, Jan Beulich wrote:On 09.12.2020 13:11, Julien Grall wrote:On 26/11/2020 11:20, Jan Beulich wrote:On 26.11.2020 09:03, Juergen Gross wrote:When the host crashes it would sometimes be nice to have additionaldebug data available which could be produced via debug keys, buthalting the server for manual intervention might be impossible due tothe need to reboot/kexec rather sooner than later.Add support for automatic debug key actions in case of crashes whichcan be activated via boot- or runtime-parameter.Depending on the type of crash the desired data might be different, sosupport different settings for the possible types of crashes. The parameter is "crash-debug" with the following syntax: crash-debug-<type>=<string> with <type> being one of: panic, hwdom, watchdog, kexeccmd, debugkeyand <string> a sequence of debug key characters with '+' having thespecial semantics of a 10 millisecond pause.So "crash-debug-watchdog=0+0qr" would result in special output in caseof watchdog triggered crash (dom0 state, 10 ms pause, dom0 state, domain info, run queues). Signed-off-by: Juergen Gross <jgross@xxxxxxxx> --- V2: - switched special character '.' to '+' (Jan Beulich) - 10 ms instead of 1 s pause (Jan Beulich) - added more text to the boot parameter description (Jan Beulich) V3: - added const (Jan Beulich) - thorough test of crash reason parameter (Jan Beulich) - kexeccmd case should depend on CONFIG_KEXEC (Jan Beulich) - added dummy get_irq_regs() helper on Arm Signed-off-by: Juergen Gross <jgross@xxxxxxxx>Except for the Arm aspect, where I'm not sure using guest_cpu_user_regs() is correct in all cases,I am not entirely sure to understand what get_irq_regs() is supposed toreturned on x86. Is it the registers saved from the most recent exception?An interrupt (not an exception) sets the underlying per-CPU variable, such that interested parties will know the real context is not guest or "normal" Xen code, but an IRQ.Thanks for the explanation. I am a bit confused to why we need to give a regs to handle_keypress() because no-one seems to use it. Do you have anexplanation?dump_registers() (key 'd') is using it.To add to the confusion, it looks like that get_irqs_regs() may returnNULL. So sometimes we may pass guest_cpu_regs() (which may contain garbagge or a set too far).I guess this is a best effort approach.Indeed. If there are ways to make it "more best", we should of course follow them. (Except before Dom0 starts, I'm afraid I don't see though where garbage would come from. And even then, just like for the idle vCPU-s, it shouldn't really be garbage, or else this suggests missing initialization somewhere.)So I decided to mimick what 'd' does to see what happen if this is called early boot.diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index 7fcff9af2a7e..9d33507a26eb 100644 --- a/xen/arch/arm/setup.c +++ b/xen/arch/arm/setup.c@@ -857,6 +857,8 @@ void __init start_xen(unsigned long boot_phys_offset,*/ system_state = SYS_STATE_boot; + dump_execstate(guest_cpu_user_regs()); + vm_init(); if ( acpi_disabled ) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 30d6f375a3af..50fcf2e8d70e 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c@@ -1678,6 +1678,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)end_boot_allocator(); system_state = SYS_STATE_boot; + dump_execstate(guest_cpu_user_regs()); /* * No calls involving ACPI code should go between the setting of * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory() It leads to crash on both Arm and x86. For the Arm crash: (XEN) Data Abort Trap. Syndrome=0x1c08006 (XEN) Walking Hypervisor VA 0x10 on CPU0 via TTBR 0x0000000065a7f000 (XEN) 0TH[0x0] = 0x0000000065a7ef7f (XEN) 1ST[0x0] = 0x0000000065a7bf7f (XEN) 2ND[0x0] = 0x0000000000000000 (XEN) CPU0: Unexpected Trap: Data Abort (XEN) ----[ Xen-4.15-unstable arm64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) PC: 0000000000219674 dump_execstate+0x58/0x1ec (XEN) LR: 00000000002d77dc (XEN) SP: 000000000030fdc0 (XEN) CPSR: 800003c9 MODE:64-bit EL2h (Hypervisor, handler)(XEN) X0: 0000000000000000 X1: 0000000000000000 X2: 0000000000007fff (XEN) X3: 00000000002b7198 X4: 0000000000000080 X5: 00000000002e9a68 (XEN) X6: 0080808080808080 X7: fefefefefefeff09 X8: 7f7f7f7f7f7f7f7f (XEN) X9: 717164616f726051 X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101 (XEN) X12: 0000000000000008 X13: 00000000002b9a48 X14: 0000000000000000 (XEN) X15: 0000000000400000 X16: 00000000002ba000 X17: 00000000002b9000 (XEN) X18: 00000000002b9000 X19: 0000000000000000 X20: 000000000030feb0 (XEN) X21: 0000000080000000 X22: 00000000002f0d30 X23: 00000000002f1d68 (XEN) X24: 00000000002f0eb8 X25: 0000000040000000 X26: 0000000080000000 (XEN) X27: 0000000000000018 X28: 000000000003f970 FP: 000000000030fdc0(XEN) (XEN) VTCR_EL2: 00000000 (XEN) VTTBR_EL2: 0000000000000000 (XEN) (XEN) SCTLR_EL2: 30cd183d (XEN) HCR_EL2: 0000000000000038 (XEN) TTBR0_EL2: 0000000065a7f000 (XEN) (XEN) ESR_EL2: 97c08006 (XEN) HPFAR_EL2: 0000000000000000 (XEN) FAR_EL2: 0000000000000010 (XEN) (XEN) Xen stack trace from sp=000000000030fdc0:(XEN) 000000000030fdf0 00000000002d77dc 0000000000080000 000000007fffc000 (XEN) 0000000080000000 00000000002f0d30 000000007f68b250 00000000002001b8 (XEN) 0000000065932000 0000000065732000 00000000784f9000 0000000000000000 (XEN) 0000000000400000 0000000065a2ad30 0000000000000630 0000000000000001 (XEN) 0000000000000001 0000000000000001 0000000000000000 0000000000003000 (XEN) 00000000784f9000 00000000002bc8e4 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000300000000 0000000000000000 00000040ffffffff (XEN) 00000000ffffffff 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000(XEN) Xen call trace: (XEN) [<0000000000219674>] dump_execstate+0x58/0x1ec (PC) (XEN) [<00000000002d77dc>] start_xen+0x3d0/0xcf8 (LR) (XEN) [<00000000002d77dc>] start_xen+0x3d0/0xcf8 (XEN) [<00000000002001b8>] arm64/head.o#primary_switched+0x10/0x30 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) CPU0: Unexpected Trap: Data Abort (XEN) **************************************** For the x86 crash:(XEN) Early fatal page fault at e008:ffff82d0402188b4 (cr2=0000000000000010, ec=0000)(XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d0402188b4>] dump_execstate+0x42/0x167 (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor(XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: ffff82d0404affff rsi: 000000000000000a rdi: ffff82d0404afef8 (XEN) rbp: ffff82d0404afd90 rsp: ffff82d0404afd80 r8: 0000000000000004 (XEN) r9: 0101010101010101 r10: 0f0f0f0f0f0f0f0f r11: 5555555555555555 (XEN) r12: ffff82d0404afef8 r13: 0000000000800163 r14: ffff83000009dfb0 (XEN) r15: 0000000000000002 cr0: 0000000080050033 cr4: 00000000000000a0(XEN) cr3: 00000000bfa9e000 cr2: 0000000000000010(XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d0402188b4> (dump_execstate+0x42/0x167):(XEN) ff 7f 00 00 48 8b 40 c9 <48> 8b 40 10 66 81 38 ff 7f 75 49 3b 1d 23 18 27(XEN) Xen stack trace from rsp=ffff82d0404afd80:(XEN) 000000000023ffff 00000000000005ed ffff82d0404afee8 ffff82d0404378cb (XEN) 0000000000000002 0000000000000002 0000000000000002 0000000000000001 (XEN) 0000000000000001 0000000000000001 0000000000000001 0000000000000000 (XEN) 00000000000001ff 0000000002a45fff 0000000000240000 0000000002a45000 (XEN) 0000000000100000 0000000000000000 00000000000001ff ffff82d040475c80 (XEN) ffff82d000800163 ffff83000009dee0 ffff83000009dfb0 0000000000200001 (XEN) 0000000100000000 0000000100000000 ffff83000009df80 642ded38bf9fe4f3 (XEN) bf9fed3500000000 bfaafe980009df73 0009df73bf9fe7ea 00000004bf9fed31 (XEN) bfaafeb00009df01 0000000800000000 000000010000006e 0000000000000003 (XEN) 00000000000002f8 ffff82d0405b0000 ffff82d0404b0000 0000000000000002 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffff82d04020012f 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000e01000000000 0000000000000000 0000000000000000 00000000000000a0 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000(XEN) Xen call trace: (XEN) [<ffff82d0402188b4>] R dump_execstate+0x42/0x167 (XEN) [<ffff82d0404378cb>] F __start_xen+0x1e10/0x2906 (XEN) [<ffff82d04020012f>] F __high_start+0x8f/0x91 (XEN) (XEN) Pagetable walk from 0000000000000010: (XEN) L4[0x000] = 00000000bfa54063 ffffffffffffffff (XEN) L3[0x000] = 00000000bfa50063 ffffffffffffffff (XEN) L2[0x000] = 00000000bfa4f063 ffffffffffffffff (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL TRAP: vec 14, #PF[0000] IN INTERRUPT CONTEXT (XEN) **************************************** (XEN) (XEN) Reboot in five seconds...So I think guest_cpu_user_regs() is not quite yet ready to be called from panic().guest_cpu_user_regs() isn't the problem, but dump_execstate(). This is one of the caveats from the added boot parameter doc: some debug keys might lead to problems. 'd' seems to be such a key when used for the panic() case and the panic() happens in early boot.Right, I think we should be clearer in the documentation about the keys we know works.A different approach my be to generate an exception and call the keyhandler from there. At least you know that the register would always be accurate.Or dump_execstate() is modified to accept NULL for regs and it will do nothing in case guest_cpu_user_regs() isn't valid (a test for idle vcpu might be the easiest way to determine that).So Jan pointed out that current may not be initialized properly in early boot. So we possibly want to use "system_state" instead. Fine with me. Will send an updated patch. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |