|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.22] char/ns16550: bound execution time of ns16550_interrupt()
On Tue, Jun 23, 2026 at 03:44:06PM +0200, Jan Beulich wrote:
> On 23.06.2026 12:31, Roger Pau Monne wrote:
> > The current logic in ns16550_interrupt() will loop until the device sets
> > the NOINT in IIR. At least on the Lenovo ThinkSystem SR630 V4 the flow
> > control of the serial-over-lan emulated UART seems to be broken, as it
> > doesn't set the NOINT bit consistently. The Transmitter Holding Register
> > Empty in LSR also seems to not be properly signaled, as even with it set
> > writes to the transmit register take ~6ms. This leads to the watchdog
> > triggering very easily on such system.
> >
> > Introduce an upper bound on the execution time of ns16550_interrupt(), this
> > is currently set as 4x the polling interval, which is calculated as the
> > time to fill RX FIFO and/or empty TX FIFO. The current maximum is 5ms.
> > Once the timeout triggers the interrupt is disabled and the uart is
> > switched to polling mode.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > There's a possible alternative approach to solve this by moving the actual
> > interrupt processing to a softirq tasklet and disabling the interrupt
> > source until the processing is done, likely unifying the logic with the
> > timer task. However that's a bigger change, and too risky for 4.22 at this
> > point.
>
> +1
>
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -62,6 +62,7 @@ static struct ns16550 {
> > #endif
> > unsigned int timeout_ms;
> > bool intr_works;
> > + bool force_polling;
> > bool dw_usr_bsy;
> > #ifdef NS16550_PCI
> > /* PCI card parameters. */
> > @@ -190,12 +191,41 @@ static void cf_check ns16550_interrupt(int irq, void
> > *dev_id)
> > {
> > struct serial_port *port = dev_id;
> > struct ns16550 *uart = port->uart;
> > + /* Set quite arbitrarily as 4x the time to drain the TX or fill RX
> > FIFOs. */
>
> Nit: I'd drop the latter of the two "the".
>
> > + const s_time_t timeout = NOW() + min(MILLISECS(uart->timeout_ms * 4),
> > + MILLISECS(5));
>
> MILLISECS(min(uart->timeout_ms * 4, 5U)) ?
Bah, yes, sorry, I've added the min() later and clearly didn't apply
much brainpower about it's position.
>
> > + if ( uart->force_polling )
> > + return;
>
> As the IRQ was disabled, is this even possible? I.e. should this be some
> kind of assertion or alike?
Hm, I wasn't setting IRQ_DISABLED before, and hence needed this guard.
But now with IRQ_DISABLED being set in ->status do_IRQ() should filter
any stray interrupts. I will attempt to add an ASSERT_UNREACHABLE()
here.
> > uart->intr_works = 1;
> >
> > while ( !(ns_read_reg(uart, UART_IIR) & UART_IIR_NOINT) )
> > {
> > u8 lsr = ns_read_reg(uart, UART_LSR);
> > + s_time_t now = NOW();
> > +
> > + /* Break out of the loop if spending too much time. */
> > + if ( now > timeout )
> > + {
> > + struct irq_desc *desc = irq_to_desc(irq);
> > +
> > + /* Disable the interrupt source - it's never shared. */
> > + spin_lock_irq(&desc->lock);
>
> This needs to be spin_lock_irqsave() - we may not rely on IRQs being on
> when we make it here. However, ...
I was relying on do_IRQ() unconditionally enabling IRQs before calling
the handler, but it's safer to use the _irqsave() variant.
> > + desc->status |= IRQ_DISABLED;
> > + if ( desc->handler->disable )
> > + desc->handler->disable(desc);
> > + spin_unlock_irq(&desc->lock);
>
> ... all of this open-coding is quite bad anyway. We should probably add
> a helper for this in IRQ handling code.
Can add, no problem.
> > + /* Disable interrupt generation on the device and arm the
> > timer. */
> > + uart->force_polling = true;
> > + ns_write_reg(uart, UART_IER, 0);
> > + set_timer(&uart->timer, now + MILLISECS(uart->timeout_ms));
> > + printk(XENLOG_WARNING
> > + "uart interrupt taking too long, switched to
> > polling\n");
>
> Probably it is indeed best to keep this simple, but: A single instance of
> this taking e.g. just over 5ms (perhaps with a low baud rate) may not be
> indicative of an actual issue.
It's all fiddly, yes, I could add a counter and maybe only disable if
we have a certain amount of executions of ns16550_interrupt()
exceeding the timeout, yet I wanted to keep this simple.
> To alleviate this as least some, perhaps
> besides capping at 5ms we should also make sure that the timeout used isn't
> below ->timeout_ms?
OK, let me see what I can do.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |