[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
> > Uhmm i thought i had these switched off (due to problems earlier and then > forgot > about them .. however looking at the earlier reports these lines were also in > those reports). > > The xen-syms and these last runs are all with a prestine xen tree cloned > today (staging > branch), so the qemu-xen and seabios defined with that were also freshly > cloned > and had a new default seabios config. (just to rule out anything stale in my > tree) > > If you don't see those messages .. perhaps your seabios and qemu trees (and > at least the > seabios config) are not the most recent (they don't get updated automatically > when you just do a git pull on the main tree) ? > > In /tools/firmware/seabios-dir/.config i have: > CONFIG_USB=y > CONFIG_USB_UHCI=y > CONFIG_USB_OHCI=y > CONFIG_USB_EHCI=y > CONFIG_USB_XHCI=y > CONFIG_USB_MSC=y > CONFIG_USB_UAS=y > CONFIG_USB_HUB=y > CONFIG_USB_KEYBOARD=y > CONFIG_USB_MOUSE=y > I seem to have the same thing. Perhaps it is my XHCI controller being wonky. > And this is all just from a: > - git clone git://xenbits.xen.org/xen.git -b staging > - make clean && ./configure && make -j6 && make -j6 install Aye. .. snip.. > > 1) test_and_[set|clear]_bit sometimes return unexpected values. > > [But this might be invalid as the addition of the ffff8303faaf25a8 > > might be correct - as the second dpci the softirq is processing > > could be the MSI one] > > Would there be an easy way to stress test this function separately in some > debugging function to see if it indeed is returning unexpected values ? Sadly no. But you got me looking in the right direction when you mentioned 'timeout'. > > > 2) INIT_LIST_HEAD operations on the same CPU are not honored. > > Just curious, have you also tested the patches on AMD hardware ? Yes. To reproduce this the first thing I did was to get an AMD box. > > > >> When i look at the combination of (2) and (3), It seems it could be an > >> interaction between the two passed through devices and/or different IRQ > >> types. > > > Could be - as in it is causing this issue to show up faster than > > expected. Or it is the one that triggers more than one dpci happening > > at the same time. > > Well that didn't seem to be it (see separate amendment i mailed previously) Right, the current theory I've is that the interrupts are not being Acked within 8 milisecond and we reset the 'state' - and at the same time we get an interrupt and schedule it - while we are still processing the same interrupt. This would explain why the 'test_and_clear_bit' got the wrong value. In regards to the list poison - following this thread of logic - with the 'state = 0' set we open the floodgates for any CPU to put the same 'struct hvm_pirq_dpci' on its list. We do reset the 'state' on _every_ GSI that is mapped to a guest - so we also reset the 'state' for the MSI one (XHCI). Anyhow in your case: CPUX: CPUY: pt_irq_time_out: state = 0; [out of timer coder, the raise_softirq pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0] softirq_dpci softirq_dpci: list_del [entries poison] list_del <= BOOM Is what I believe is happening. The INTX device - once I put a load on it - does not trigger any pt_irq_time_out, so that would explain why I cannot hit this. But I believe your card hits these "hiccups". _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |