[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Problems after enabling rcv/xmit interrupts of ns16550 on OMAP5



On Thu, 2013-07-18 at 00:05 +0800, Chen Baozi wrote:
> å 2013-7-17ï23:26ïIan Campbell <Ian.Campbell@xxxxxxxxxx> åéï
> 
> >>>> "restoring CPSR" refers to the instruction "msr CPSR_c, <reg>" which
> >>>> is from "local_irq_restore". And "cpsie i" is from the call to
> >>>> local_irq_enable".
> >>> 
> >>> Ah right. So in both cases you will immediately take any pending
> >>> interrupt. I think I would continue instrumenting starting from
> >>> gic_interrupt() and hopefully eventually into the ns16550 interrupt
> >>> handler.
> >> 
> >> I went through gic_interrupt() and thought got the points cause the stuck.
> > 
> > Please can you clarify exactly what you mean by "stuck". Previously you
> > thought it was stuck in ns16550_setup_postirq when in actual fact it was
> > taking an interrupt.
> 
> I thought it was "stuck" because since every time I pressed 'd' to
> dump the registers the PC always stayed at the same position during
> executing ns16550_setup_postirq. So it really looks like that that the
> system get stuck at that point. Sorry if I made a wrong description.

No problem. In fact if 'd' works perhaps you are not blowing the stack
at all with multiple interrupts.

Ah, you are probably never escaping the loop in gic_interrupt because
the read of IAR always returns the UART interrupt.

> 
> > Are you sure that you are taking multiple,
> > potentially nested interrupts and eventually blowing the hypervisor
> > stack? This seems like the most likely scenario to me.
> 
> Seems reasonable. Is there any way to prove that we are under this
> situation? I didn't expect this possibility before. Thanks.

I was about to say that a printk in gic_interrupt ought to confirm, but
since the UART IRQ is the problem perhaps that isn't so obvious, unless
sync_console helps in some way. Worth a try.

If not then since 'd' works then perhaps you could keep a count of the
number serial IRQs in a global var and dump it?

> 
> > 
> >> If I change the while(...) in ns16550_interrupt() into if(...) and comment
> >> either "GICC[GICC_EOIR] = irq;" or "GICC[GICC_DIR] = irq;" in
> >> git_host_irq_end(), it won't get stuck after enabling receive and transmit
> >> interrupts in ns16550_setup_postirq().
> > 
> > By removing the writes to either EOIR or DIR you are in effect never
> > unmasking the interrupt, so you avoid the nest interrupt problem.
> > 
> > If this is the case then real issue is perhaps that for whatever reason
> > ns16550_interrupt is not causing the hardware to deassert its interrupt
> > line.
> > 
> > The UART on the sunxi is compatible (in DTS terms) with
> > "snps,dw-apb-uart", which seems to be an 8250 variant, but one which
> > differs enough to warrant its own compatibility string -- perhaps Xen's
> > ns16550 driver isn't dealing with some quirk of this device?
> 
> I checked my OMAP5's data sheet. Generally, they looks very similar.
> But I will read the manual more carefully again tomorrow to make sure
> this point.

Good idea.

> 
> > 
> > It seems like the driver in Linux is drivers/tty/serial/8250/8250_dw.c.
> > dw8250_handle_irq looks interesting...
> > 
> >        struct dw8250_data *d = p->private_data;
> >        unsigned int iir = p->serial_in(p, UART_IIR);
> > 
> >        if (serial8250_handle_irq(p, iir)) {
> >                return 1;
> >        } else if ((iir & UART_IIR_BUSY) == UART_IIR_BUSY) {
> >                /* Clear the USR and write the LCR again. */
> >                (void)p->serial_in(p, DW_UART_USR);
> >                p->serial_out(p, UART_LCR, d->last_lcr);
> > 
> >                return 1;
> >        }
> > 
> >        return 0;
> > 
> > In particular the fallback code there when the common 8250 handler
> > didn't deal with the issue...
> 
> I'll get down to the Linux driver tomorrow to see whether I could catch the 
> point.

Actually, the comment at the top is interesting:
 12  * The Synopsys DesignWare 8250 has an extra feature whereby it detects if 
the
 13  * LCR is written whilst busy.  If it is, then a busy detect interrupt is
 14  * raised, the LCR needs to be rewritten and the uart status register read.

I'm not sure that "extra feature" doesn't mean "weird quirk" but there we go ;-)

The changelog of the patch which added it is interesting too:
http://permalink.gmane.org/gmane.linux.serial/5855

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.