|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
>
> Uhmm i thought i had these switched off (due to problems earlier and then
> forgot
> about them .. however looking at the earlier reports these lines were also in
> those reports).
>
> The xen-syms and these last runs are all with a prestine xen tree cloned
> today (staging
> branch), so the qemu-xen and seabios defined with that were also freshly
> cloned
> and had a new default seabios config. (just to rule out anything stale in my
> tree)
>
> If you don't see those messages .. perhaps your seabios and qemu trees (and
> at least the
> seabios config) are not the most recent (they don't get updated automatically
> when you just do a git pull on the main tree) ?
>
> In /tools/firmware/seabios-dir/.config i have:
> CONFIG_USB=y
> CONFIG_USB_UHCI=y
> CONFIG_USB_OHCI=y
> CONFIG_USB_EHCI=y
> CONFIG_USB_XHCI=y
> CONFIG_USB_MSC=y
> CONFIG_USB_UAS=y
> CONFIG_USB_HUB=y
> CONFIG_USB_KEYBOARD=y
> CONFIG_USB_MOUSE=y
>
I seem to have the same thing. Perhaps it is my XHCI controller being wonky.
> And this is all just from a:
> - git clone git://xenbits.xen.org/xen.git -b staging
> - make clean && ./configure && make -j6 && make -j6 install
Aye.
.. snip..
> > 1) test_and_[set|clear]_bit sometimes return unexpected values.
> > [But this might be invalid as the addition of the ffff8303faaf25a8
> > might be correct - as the second dpci the softirq is processing
> > could be the MSI one]
>
> Would there be an easy way to stress test this function separately in some
> debugging function to see if it indeed is returning unexpected values ?
Sadly no. But you got me looking in the right direction when you mentioned
'timeout'.
>
> > 2) INIT_LIST_HEAD operations on the same CPU are not honored.
>
> Just curious, have you also tested the patches on AMD hardware ?
Yes. To reproduce this the first thing I did was to get an AMD box.
>
>
> >> When i look at the combination of (2) and (3), It seems it could be an
> >> interaction between the two passed through devices and/or different IRQ
> >> types.
>
> > Could be - as in it is causing this issue to show up faster than
> > expected. Or it is the one that triggers more than one dpci happening
> > at the same time.
>
> Well that didn't seem to be it (see separate amendment i mailed previously)
Right, the current theory I've is that the interrupts are not being
Acked within 8 milisecond and we reset the 'state' - and at the same
time we get an interrupt and schedule it - while we are still processing
the same interrupt. This would explain why the 'test_and_clear_bit'
got the wrong value.
In regards to the list poison - following this thread of logic - with
the 'state = 0' set we open the floodgates for any CPU to put the same
'struct hvm_pirq_dpci' on its list.
We do reset the 'state' on _every_ GSI that is mapped to a guest - so
we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
CPUX: CPUY:
pt_irq_time_out:
state = 0;
[out of timer coder, the raise_softirq
pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0]
softirq_dpci softirq_dpci:
list_del
[entries poison]
list_del <= BOOM
Is what I believe is happening.
The INTX device - once I put a load on it - does not trigger
any pt_irq_time_out, so that would explain why I cannot hit this.
But I believe your card hits these "hiccups".
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |