[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/watchdog: Use real timestamps for watchdog timeout
>>> On 23.05.13 at 22:32, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > Do not assume that we will only receive interrupts at a rate of nmi_hz. On a > test system being debugged, I observed a PCI SERR being continuously > asserted > without the SERR bit being set. The result was Xen "exceeding" a 300 second > timeout within 1 second. This is a questionable rationale for the patch, no matter that conceptually I don't mind a change like this. On broken systems like this it may be more reasonable to require the watchdog (which is disabled by default anyway) to not get enabled. > @@ -432,23 +432,22 @@ void nmi_watchdog_tick(struct cpu_user_r > if ( (this_cpu(last_irq_sums) == sum) && > !atomic_read(&watchdog_disable_count) ) > { > - /* > - * Ayiee, looks like this CPU is stuck ... wait for the timeout > - * before doing the oops ... > - */ > - this_cpu(alert_counter)++; > - if ( this_cpu(alert_counter) == opt_watchdog_timeout*nmi_hz ) > + s_time_t last_change = this_cpu(last_irq_change); > + > + if ( (NOW() - last_change) > SECONDS(opt_watchdog_timeout) ) You can't use NOW() here - while the time updating code is safe against normal interrupts, it's not atomic wrt NMIs. Since you don't really need this calculation to be very precise, doing it using plain TSC values might be acceptable, with the one caveat that you'd need to check that doing this in the middle of a TSC update (by time_calibration_tsc_rendezvous()) is still safe and sufficiently precise. The only other (non-generic) alternative I see is to use the HPET main counter if 64-bit capable _and_ we ran it as 64-bit counter. But that would neither cover all machines nor be backportable (because of 32-bit's constraints). Jan > { > + /* Ayiee, looks like this CPU is stuck. */ > + > console_force_unlock(); > printk("Watchdog timer detects that CPU%d is stuck!\n", > smp_processor_id()); > fatal_trap(TRAP_nmi, regs); > } > - } > - else > + } > + else > { > this_cpu(last_irq_sums) = sum; > - this_cpu(alert_counter) = 0; > + this_cpu(last_irq_change) = NOW(); > } > > if ( nmi_perfctr_msr ) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |