[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Problems after enabling rcv/xmit interrupts of ns16550 on OMAP5



On Jul 18, 2013, at 8:44 PM, Chen Baozi <baozich@xxxxxxxxx> wrote:

> 
> On Jul 18, 2013, at 7:53 PM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> 
>> On Thu, 2013-07-18 at 00:05 +0800, Chen Baozi wrote:
>>> å 2013-7-17ï23:26ïIan Campbell <Ian.Campbell@xxxxxxxxxx> åéï
>>> 
>>>>>>> "restoring CPSR" refers to the instruction "msr CPSR_c, <reg>" which
>>>>>>> is from "local_irq_restore". And "cpsie i" is from the call to
>>>>>>> local_irq_enable".
>>>>>> 
>>>>>> Ah right. So in both cases you will immediately take any pending
>>>>>> interrupt. I think I would continue instrumenting starting from
>>>>>> gic_interrupt() and hopefully eventually into the ns16550 interrupt
>>>>>> handler.
>>>>> 
>>>>> I went through gic_interrupt() and thought got the points cause the stuck.
>>>> 
>>>> Please can you clarify exactly what you mean by "stuck". Previously you
>>>> thought it was stuck in ns16550_setup_postirq when in actual fact it was
>>>> taking an interrupt.
>>> 
>>> I thought it was "stuck" because since every time I pressed 'd' to
>>> dump the registers the PC always stayed at the same position during
>>> executing ns16550_setup_postirq. So it really looks like that that the
>>> system get stuck at that point. Sorry if I made a wrong description.
>> 
>> No problem. In fact if 'd' works perhaps you are not blowing the stack
>> at all with multiple interrupts.
>> 
>> Ah, you are probably never escaping the loop in gic_interrupt because
>> the read of IAR always returns the UART interrupt.
>> 
>>> 
>>>> Are you sure that you are taking multiple,
>>>> potentially nested interrupts and eventually blowing the hypervisor
>>>> stack? This seems like the most likely scenario to me.
>>> 
>>> Seems reasonable. Is there any way to prove that we are under this
>>> situation? I didn't expect this possibility before. Thanks.
>> 
>> I was about to say that a printk in gic_interrupt ought to confirm, but
>> since the UART IRQ is the problem perhaps that isn't so obvious, unless
>> sync_console helps in some way. Worth a try.
>> 
>> If not then since 'd' works then perhaps you could keep a count of the
>> number serial IRQs in a global var and dump it?
> 
> Thanks. I'll have a try.
> 
>> 
>>> 
>>>> 
>>>>> If I change the while(...) in ns16550_interrupt() into if(...) and comment
>>>>> either "GICC[GICC_EOIR] = irq;" or "GICC[GICC_DIR] = irq;" in
>>>>> git_host_irq_end(), it won't get stuck after enabling receive and transmit
>>>>> interrupts in ns16550_setup_postirq().
>>>> 
>>>> By removing the writes to either EOIR or DIR you are in effect never
>>>> unmasking the interrupt, so you avoid the nest interrupt problem.
>>>> 
>>>> If this is the case then real issue is perhaps that for whatever reason
>>>> ns16550_interrupt is not causing the hardware to deassert its interrupt
>>>> line.
>>>> 
>>>> The UART on the sunxi is compatible (in DTS terms) with
>>>> "snps,dw-apb-uart", which seems to be an 8250 variant, but one which
>>>> differs enough to warrant its own compatibility string -- perhaps Xen's
>>>> ns16550 driver isn't dealing with some quirk of this device?
>>> 
>>> I checked my OMAP5's data sheet. Generally, they looks very similar.
>>> But I will read the manual more carefully again tomorrow to make sure
>>> this point.
>> 
>> Good idea.
>> 
>>> 
>>>> 
>>>> It seems like the driver in Linux is drivers/tty/serial/8250/8250_dw.c.
>>>> dw8250_handle_irq looks interesting...
>>>> 
>>>>      struct dw8250_data *d = p->private_data;
>>>>      unsigned int iir = p->serial_in(p, UART_IIR);
>>>> 
>>>>      if (serial8250_handle_irq(p, iir)) {
>>>>              return 1;
>>>>      } else if ((iir & UART_IIR_BUSY) == UART_IIR_BUSY) {
>>>>              /* Clear the USR and write the LCR again. */
>>>>              (void)p->serial_in(p, DW_UART_USR);
>>>>              p->serial_out(p, UART_LCR, d->last_lcr);
>>>> 
>>>>              return 1;
>>>>      }
>>>> 
>>>>      return 0;
>>>> 
>>>> In particular the fallback code there when the common 8250 handler
>>>> didn't deal with the issue...
>>> 
>>> I'll get down to the Linux driver tomorrow to see whether I could catch the 
>>> point.
>> 
>> Actually, the comment at the top is interesting:
>> 12  * The Synopsys DesignWare 8250 has an extra feature whereby it detects 
>> if the
>> 13  * LCR is written whilst busy.  If it is, then a busy detect interrupt is
>> 14  * raised, the LCR needs to be rewritten and the uart status register 
>> read.
>> 
>> I'm not sure that "extra feature" doesn't mean "weird quirk" but there we go 
>> ;-)
>> 
>> The changelog of the patch which added it is interesting too:
>> http://permalink.gmane.org/gmane.linux.serial/5855
> 
> I checked the Linux driver today. Since the UART of my OMAP5432 board is 
> compatible with "ti,omap4-uart", the driver in Linux should be 
> drivers/tty/serial/omap-serial.c rather than 
> drivers/tty/serial/8250/8250_dw.c.
> 
> In serial_omap_irq of omap-serial.c, there are no such fallback codes as 
> DesignWare's. However, it does check the modem status register. I used to 
> think this would be the point, because "Modem Status" interrupt must be 
> cleared by reading the modem status register. However, it seems reading this 
> register doesn't work :-(

Hurrah! I've finally made it.

It is about the bit flags to be set when enabling interrupts in 
ns16550_setup_postirq. I checked the startup function of 8250 compatible 
drivers in Linux, in which "receiver data interrupt" (bit 0) and "receiver line 
status interrupt" (bit 2) are set. However, in ns16550_setup_postirq, they are 
"receiver data interrupt" (bit 0) and "transmitter holding register empty 
interrupt" (bit 1). So after applying the following changes in 
ns16550_setup_postirq:

-               ns_write_reg(uart, UART_IER, UART_IER_ERDAI | UART_IER_ETHREI);
+               ns_write_reg(uart, UART_IER, UART_IER_ERDAI | UART_IER_ELSI);

UART console works!

And a printk in ns16550_interrupt works. I used it to check the register flags 
and was finally led to the point.

I'm wondering if this would be a common fix to ns16550 driver considering Linux 
driver doesn't set UART_IER_THRI (UART_IER_ETHREI in ns16550 driver of Xen) 
while enabling interrupts for 8250.

Cheers,

Baozi
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.