[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.22] char/ns16550: bound execution time of ns16550_interrupt()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 23 Jun 2026 16:16:22 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gRi9DCMsIiy3yIO/O4lNqm+ACu7ArUxeBjhIO/OnPhk=; b=QQNG7KznCA1/96yKJQNs7TKiD3clX93006zeVLUTZxhCPJhzdD2nd1Tywks9rRaHmKrphLG7WMSoa4Qi5jQP7XT9WhEp7olP96i1nf4Cl/a5+EJCa8jO4cq1JN8JcFQVtuXx91Y9WvucLjyFhOZhslSpPlC7RORoxIM+kBRDGqk+Y8pVGZEpa61uib5n/uwEPDHQ5Ymvj7FPF5Qb69JhMlKJedCTUHWwDODabOSbVX6y3oFj1CmjZ5U4jWMYzB0CSVXfQTlHE1qJ2d/hT4bcIJW9jaqRCkOl2+DSYAZQRDctFxxRUbNbeDvD9enSk+9P2NIJd1IPaF/GqvzOGhAylQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MpYtDc1toK6hNApqX4mcIj8cYsPdJz+OlHNH9Ya79KB/OdQ5pWXARGgGapC4/iNccBWMa1FWWLTxfovLGJWm7iPhFlr4VsmLj311rEw600C+ii9e0TvGQ2m/UxwP1hjXWI3hgrHy9uet1oP9IKAgxy62k5nIna4mH3PElYFXZ8O4JLPhT3pLofsUPYEadfLzUiDDcUpxkgQ0+/nkKJ88vs/Z/5SPTeY/U0vphZWKYNLFstN5oYFvwdJfKPJMyUb8jXJcuszLxWtcISKn03w3i+Udtup8FxnrgGxLpkvE2l5/MOiRG7tDHN3osN6AU9p9A0GE1rMYt4QiqbZrcTWm3A==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 23 Jun 2026 14:16:48 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Jun 23, 2026 at 03:44:06PM +0200, Jan Beulich wrote:
> On 23.06.2026 12:31, Roger Pau Monne wrote:
> > The current logic in ns16550_interrupt() will loop until the device sets
> > the NOINT in IIR.  At least on the Lenovo ThinkSystem SR630 V4 the flow
> > control of the serial-over-lan emulated UART seems to be broken, as it
> > doesn't set the NOINT bit consistently.  The Transmitter Holding Register
> > Empty in LSR also seems to not be properly signaled, as even with it set
> > writes to the transmit register take ~6ms.  This leads to the watchdog
> > triggering very easily on such system.
> > 
> > Introduce an upper bound on the execution time of ns16550_interrupt(), this
> > is currently set as 4x the polling interval, which is calculated as the
> > time to fill RX FIFO and/or empty TX FIFO.  The current maximum is 5ms.
> > Once the timeout triggers the interrupt is disabled and the uart is
> > switched to polling mode.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > There's a possible alternative approach to solve this by moving the actual
> > interrupt processing to a softirq tasklet and disabling the interrupt
> > source until the processing is done, likely unifying the logic with the
> > timer task.  However that's a bigger change, and too risky for 4.22 at this
> > point.
> 
> +1
> 
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -62,6 +62,7 @@ static struct ns16550 {
> >  #endif
> >      unsigned int timeout_ms;
> >      bool intr_works;
> > +    bool force_polling;
> >      bool dw_usr_bsy;
> >  #ifdef NS16550_PCI
> >      /* PCI card parameters. */
> > @@ -190,12 +191,41 @@ static void cf_check ns16550_interrupt(int irq, void 
> > *dev_id)
> >  {
> >      struct serial_port *port = dev_id;
> >      struct ns16550 *uart = port->uart;
> > +    /* Set quite arbitrarily as 4x the time to drain the TX or fill RX 
> > FIFOs. */
> 
> Nit: I'd drop the latter of the two "the".
> 
> > +    const s_time_t timeout = NOW() + min(MILLISECS(uart->timeout_ms * 4),
> > +                                         MILLISECS(5));
> 
> MILLISECS(min(uart->timeout_ms * 4, 5U)) ?

Bah, yes, sorry, I've added the min() later and clearly didn't apply
much brainpower about it's position.

> 
> > +    if ( uart->force_polling )
> > +        return;
> 
> As the IRQ was disabled, is this even possible? I.e. should this be some
> kind of assertion or alike?

Hm, I wasn't setting IRQ_DISABLED before, and hence needed this guard.
But now with IRQ_DISABLED being set in ->status do_IRQ() should filter
any stray interrupts.  I will attempt to add an ASSERT_UNREACHABLE()
here.

> >      uart->intr_works = 1;
> >  
> >      while ( !(ns_read_reg(uart, UART_IIR) & UART_IIR_NOINT) )
> >      {
> >          u8 lsr = ns_read_reg(uart, UART_LSR);
> > +        s_time_t now = NOW();
> > +
> > +        /* Break out of the loop if spending too much time. */
> > +        if ( now > timeout )
> > +        {
> > +            struct irq_desc *desc = irq_to_desc(irq);
> > +
> > +            /* Disable the interrupt source - it's never shared. */
> > +            spin_lock_irq(&desc->lock);
> 
> This needs to be spin_lock_irqsave() - we may not rely on IRQs being on
> when we make it here. However, ...

I was relying on do_IRQ() unconditionally enabling IRQs before calling
the handler, but it's safer to use the _irqsave() variant.

> > +            desc->status |= IRQ_DISABLED;
> > +            if ( desc->handler->disable )
> > +                desc->handler->disable(desc);
> > +            spin_unlock_irq(&desc->lock);
> 
> ... all of this open-coding is quite bad anyway. We should probably add
> a helper for this in IRQ handling code.

Can add, no problem.

> > +            /* Disable interrupt generation on the device and arm the 
> > timer. */
> > +            uart->force_polling = true;
> > +            ns_write_reg(uart, UART_IER, 0);
> > +            set_timer(&uart->timer, now + MILLISECS(uart->timeout_ms));
> > +            printk(XENLOG_WARNING
> > +                   "uart interrupt taking too long, switched to 
> > polling\n");
> 
> Probably it is indeed best to keep this simple, but: A single instance of
> this taking e.g. just over 5ms (perhaps with a low baud rate) may not be
> indicative of an actual issue.

It's all fiddly, yes, I could add a counter and maybe only disable if
we have a certain amount of executions of ns16550_interrupt()
exceeding the timeout, yet I wanted to keep this simple.

> To alleviate this as least some, perhaps
> besides capping at 5ms we should also make sure that the timeout used isn't
> below ->timeout_ms?

OK, let me see what I can do.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.