[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/nmi: Make external NMI injection reliably crash the host
>>> On 26.08.14 at 12:10, <ross.lagerwall@xxxxxxxxxx> wrote: > --- a/xen/arch/x86/nmi.c > +++ b/xen/arch/x86/nmi.c > @@ -15,6 +15,7 @@ > > #include <xen/config.h> > #include <xen/init.h> > +#include <xen/stdbool.h> ??? > @@ -432,35 +436,23 @@ int __init watchdog_setup(void) > return 0; > } > > -void nmi_watchdog_tick(const struct cpu_user_regs *regs) > +/* Returns true if this was a watchdog NMI, false otherwise */ > +bool_t nmi_watchdog_tick(const struct cpu_user_regs *regs) > { > + bool_t watchdog_tick = 0; > unsigned int sum = this_cpu(nmi_timer_ticks); > > - if ( (this_cpu(last_irq_sums) == sum) && watchdog_enabled() ) > - { > - /* > - * Ayiee, looks like this CPU is stuck ... wait for the timeout > - * before doing the oops ... > - */ > - this_cpu(alert_counter)++; > - if ( this_cpu(alert_counter) == opt_watchdog_timeout*nmi_hz ) > - { > - console_force_unlock(); > - printk("Watchdog timer detects that CPU%d is stuck!\n", > - smp_processor_id()); > - fatal_trap(TRAP_nmi, regs); > - } > - } > - else > - { > - this_cpu(last_irq_sums) = sum; > - this_cpu(alert_counter) = 0; > - } > - > if ( nmi_perfctr_msr ) > { So if we don't get into this block ... > + uint64_t msr_content; > + > + /* Work out if this is a watchdog tick by checking for overflow. */ > if ( nmi_perfctr_msr == MSR_P4_IQ_PERFCTR0 ) > { > + rdmsrl(MSR_P4_IQ_CCCR0, msr_content); > + if ( msr_content & P4_CCCR_OVF ) > + watchdog_tick = 1; > + > /* > * P4 quirks: > * - An overflown perfctr will assert its interrupt > @@ -473,14 +465,52 @@ void nmi_watchdog_tick(const struct cpu_user_regs *regs) > } > else if ( nmi_perfctr_msr == MSR_P6_PERFCTR0 ) > { > + rdmsrl(MSR_P6_PERFCTR0, msr_content); > + watchdog_tick = !(msr_content & (1ULL << P6_EVENT_WIDTH)); > + > /* > * Only P6 based Pentium M need to re-unmask the apic vector but > * it doesn't hurt other P6 variants. > */ > apic_write(APIC_LVTPC, APIC_DM_NMI); > } > - write_watchdog_counter(NULL); > + else if ( nmi_perfctr_msr == MSR_K7_PERFCTR0 ) > + { > + rdmsrl(MSR_K7_PERFCTR0, msr_content); > + watchdog_tick = !(msr_content & (1ULL << K7_EVENT_WIDTH)); > + } > + > + if ( watchdog_tick ) > + { ... we'll never get here. That's clearly a change in behavior for systems with no suitable perfctr MSR. I'm of the opinion that for such systems behavior should not change from what it is right now, i.e. if you can't exclude the NMI to be watchdog induced, assume it is (rather than assuming that to be a fatal condition). Or wait - is this only a theoretical consideration? If so, the patch description should be adjusted to make clear we can't get there. And perhaps the surrounding if(nmi_perfctr_msr) should then become ASSERT(nmi_perfctr_msr). > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -3312,8 +3312,8 @@ void do_nmi(const struct cpu_user_regs *regs) > if ( nmi_callback(regs, cpu) ) > return; > > - if ( nmi_watchdog ) > - nmi_watchdog_tick(regs); > + if ( nmi_watchdog && nmi_watchdog_tick(regs) ) > + return; > > /* Only the BSP gets external NMIs from the system. */ > if ( cpu == 0 ) > @@ -3323,7 +3323,7 @@ void do_nmi(const struct cpu_user_regs *regs) > pci_serr_error(regs); > if ( reason & 0x40 ) > io_check_error(regs); > - if ( !(reason & 0xc0) && !nmi_watchdog ) > + if ( !(reason & 0xc0) ) > unknown_nmi_error(regs, reason); As much as I like the original idea, I'm afraid this won't fly: I do know of systems where bad motherboard design leads to neither of these two bits ever getting set. I.e. at the very minimum we'd need a command line option to restore old behavior. Personally I think it should in fact remain default behavior, and new behavior should only be enabled via command line option. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |