[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible

On 11/02/2020 15:00, Roger Pau Monné wrote:
> On Mon, Feb 10, 2020 at 09:49:30PM +0100, Sander Eikelenboom wrote:
>> On 03/02/2020 14:21, Roger Pau Monné wrote:
>>> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
>>>> On 03/02/2020 13:41, Roger Pau Monné wrote:
>>>>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>>>>>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>>>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>>>>>> Hi Roger,
>>>>>>>> Last week I encountered an issue with the PCI-passthrough of a USB 
>>>>>>>> controller. 
>>>>>>>> In the guest I get:
>>>>>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to 
>>>>>>>> stop endpoint command.
>>>>>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not 
>>>>>>>> responding, assume dead
>>>>>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>>>>> Bisection turned up as the culprit: 
>>>>>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>>>> Sorry to hear that, let see if we can figure out what's wrong.
>>>>>> No problem, that is why I test stuff :)
>>>>>>>> I verified by reverting that commit and now it works fine again.
>>>>>>> Does the same controller work fine when used in dom0?
>>>>>> Will test that, but as all other pci devices in dom0 work fine,
>>>>>> I assume this controller would also work fine in dom0 (as it has also
>>>>>> worked fine for ages with PCI-passthrough to that guest and still works
>>>>>> fine when reverting the referenced commit).
>>>>> Is this the only device that fails to work when doing pci-passthrough,
>>>>> or other devices also don't work with the mentioned change applied?
>>>>> Have you tested on other boxes?
>>>>>> I don't know if your change can somehow have a side effect
>>>>>> on latency around the processing of pci-passthrough ?
>>>>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
>>>>> see how it could slow down other interrupts. Also I would think the
>>>>> domain is not receiving interrupts from the device, rather than
>>>>> interrupts being slow.
>>>>> Can you also paste the output of lspci -v for that xHCI device from
>>>>> dom0?
>>>>> Thanks, Roger.
>>>> Will do this evening including the testing in dom0 etc.
>>>> Will also see if there is any pattern when observing /proc/interrupts in
>>>> the guest.
>>> Thanks! I also have some trivial patch that I would like you to try,
>>> just to discard send_IPI_mask clearing the scratch_cpumask under
>>> another function feet.
>>> Roger.
>> Hi Roger,
>> Took a while, but I was able to run some tests now.
>> I also forgot a detail in the first report (probably still a bit tired from 
>> FOSDEM), 
>> namely: the device passedthrough works OK for a while before I get the 
>> kernel message.
>> I tested the patch and it looks like it makes the issue go away,
>> I tested for a day, while without the patch (or revert of the commit) the 
>> device
>> will give problems within a few hours.
> Thanks, I have another patch for you to try, which will likely make
> your system crash. Could you give it a try and paste the log output?
> Thanks, Roger.

Applied the patch, rebuild, rebooted and braced for impact ...
However the device bugged again, but no xen panic occured, so nothing
special in the logs.
I only had time to try it once, so I could retry this evening.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.