|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.22] char/ns16550: bound execution time of ns16550_interrupt()
On 23.06.2026 12:31, Roger Pau Monne wrote:
> The current logic in ns16550_interrupt() will loop until the device sets
> the NOINT in IIR. At least on the Lenovo ThinkSystem SR630 V4 the flow
> control of the serial-over-lan emulated UART seems to be broken, as it
> doesn't set the NOINT bit consistently. The Transmitter Holding Register
> Empty in LSR also seems to not be properly signaled, as even with it set
> writes to the transmit register take ~6ms. This leads to the watchdog
> triggering very easily on such system.
>
> Introduce an upper bound on the execution time of ns16550_interrupt(), this
> is currently set as 4x the polling interval, which is calculated as the
> time to fill RX FIFO and/or empty TX FIFO. The current maximum is 5ms.
> Once the timeout triggers the interrupt is disabled and the uart is
> switched to polling mode.
>
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> There's a possible alternative approach to solve this by moving the actual
> interrupt processing to a softirq tasklet and disabling the interrupt
> source until the processing is done, likely unifying the logic with the
> timer task. However that's a bigger change, and too risky for 4.22 at this
> point.
+1
> --- a/xen/drivers/char/ns16550.c
> +++ b/xen/drivers/char/ns16550.c
> @@ -62,6 +62,7 @@ static struct ns16550 {
> #endif
> unsigned int timeout_ms;
> bool intr_works;
> + bool force_polling;
> bool dw_usr_bsy;
> #ifdef NS16550_PCI
> /* PCI card parameters. */
> @@ -190,12 +191,41 @@ static void cf_check ns16550_interrupt(int irq, void
> *dev_id)
> {
> struct serial_port *port = dev_id;
> struct ns16550 *uart = port->uart;
> + /* Set quite arbitrarily as 4x the time to drain the TX or fill RX
> FIFOs. */
Nit: I'd drop the latter of the two "the".
> + const s_time_t timeout = NOW() + min(MILLISECS(uart->timeout_ms * 4),
> + MILLISECS(5));
MILLISECS(min(uart->timeout_ms * 4, 5U)) ?
> + if ( uart->force_polling )
> + return;
As the IRQ was disabled, is this even possible? I.e. should this be some
kind of assertion or alike?
> uart->intr_works = 1;
>
> while ( !(ns_read_reg(uart, UART_IIR) & UART_IIR_NOINT) )
> {
> u8 lsr = ns_read_reg(uart, UART_LSR);
> + s_time_t now = NOW();
> +
> + /* Break out of the loop if spending too much time. */
> + if ( now > timeout )
> + {
> + struct irq_desc *desc = irq_to_desc(irq);
> +
> + /* Disable the interrupt source - it's never shared. */
> + spin_lock_irq(&desc->lock);
This needs to be spin_lock_irqsave() - we may not rely on IRQs being on
when we make it here. However, ...
> + desc->status |= IRQ_DISABLED;
> + if ( desc->handler->disable )
> + desc->handler->disable(desc);
> + spin_unlock_irq(&desc->lock);
... all of this open-coding is quite bad anyway. We should probably add
a helper for this in IRQ handling code.
> + /* Disable interrupt generation on the device and arm the timer.
> */
> + uart->force_polling = true;
> + ns_write_reg(uart, UART_IER, 0);
> + set_timer(&uart->timer, now + MILLISECS(uart->timeout_ms));
> + printk(XENLOG_WARNING
> + "uart interrupt taking too long, switched to polling\n");
Probably it is indeed best to keep this simple, but: A single instance of
this taking e.g. just over 5ms (perhaps with a low baud rate) may not be
indicative of an actual issue. To alleviate this as least some, perhaps
besides capping at 5ms we should also make sure that the timeout used isn't
below ->timeout_ms?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |