[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] xen: add support for automatic debug key actions in case of crash
On 11.12.2020 08:02, Jürgen Groß wrote: > On 10.12.20 21:51, Julien Grall wrote: >> Hi Jan, >> >> On 09/12/2020 14:29, Jan Beulich wrote: >>> On 09.12.2020 13:11, Julien Grall wrote: >>>> On 26/11/2020 11:20, Jan Beulich wrote: >>>>> On 26.11.2020 09:03, Juergen Gross wrote: >>>>>> When the host crashes it would sometimes be nice to have additional >>>>>> debug data available which could be produced via debug keys, but >>>>>> halting the server for manual intervention might be impossible due to >>>>>> the need to reboot/kexec rather sooner than later. >>>>>> >>>>>> Add support for automatic debug key actions in case of crashes which >>>>>> can be activated via boot- or runtime-parameter. >>>>>> >>>>>> Depending on the type of crash the desired data might be different, so >>>>>> support different settings for the possible types of crashes. >>>>>> >>>>>> The parameter is "crash-debug" with the following syntax: >>>>>> >>>>>> crash-debug-<type>=<string> >>>>>> >>>>>> with <type> being one of: >>>>>> >>>>>> panic, hwdom, watchdog, kexeccmd, debugkey >>>>>> >>>>>> and <string> a sequence of debug key characters with '+' having the >>>>>> special semantics of a 10 millisecond pause. >>>>>> >>>>>> So "crash-debug-watchdog=0+0qr" would result in special output in case >>>>>> of watchdog triggered crash (dom0 state, 10 ms pause, dom0 state, >>>>>> domain info, run queues). >>>>>> >>>>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> >>>>>> --- >>>>>> V2: >>>>>> - switched special character '.' to '+' (Jan Beulich) >>>>>> - 10 ms instead of 1 s pause (Jan Beulich) >>>>>> - added more text to the boot parameter description (Jan Beulich) >>>>>> >>>>>> V3: >>>>>> - added const (Jan Beulich) >>>>>> - thorough test of crash reason parameter (Jan Beulich) >>>>>> - kexeccmd case should depend on CONFIG_KEXEC (Jan Beulich) >>>>>> - added dummy get_irq_regs() helper on Arm >>>>>> >>>>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> >>>>> >>>>> Except for the Arm aspect, where I'm not sure using >>>>> guest_cpu_user_regs() is correct in all cases, >>>> >>>> I am not entirely sure to understand what get_irq_regs() is supposed to >>>> returned on x86. Is it the registers saved from the most recent >>>> exception? >>> >>> An interrupt (not an exception) sets the underlying per-CPU >>> variable, such that interested parties will know the real >>> context is not guest or "normal" Xen code, but an IRQ. >> >> Thanks for the explanation. I am a bit confused to why we need to give a >> regs to handle_keypress() because no-one seems to use it. Do you have an >> explanation? > > dump_registers() (key 'd') is using it. > >> >> To add to the confusion, it looks like that get_irqs_regs() may return >> NULL. So sometimes we may pass guest_cpu_regs() (which may contain >> garbagge or a set too far). > > I guess this is a best effort approach. Indeed. If there are ways to make it "more best", we should of course follow them. (Except before Dom0 starts, I'm afraid I don't see though where garbage would come from. And even then, just like for the idle vCPU-s, it shouldn't really be garbage, or else this suggests missing initialization somewhere.) >> I guess providing the wrong information to handle_keypress() is not >> going to matter that much because no-one use it (?). Although, I'd like >> to make sure this is not going to bite us in the future. > > TBH using the 'd' handler isn't making that much sense, as the > information delivered would be of interest only in case of a panic(), > which is already printing that information. I disagree. I've had numerous cases where I found this key very useful. Or do you mean what you say just for the new purpose added here? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |