[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: pci-passthrough "irq 16: nobody cared" on HVM guest shutdown on irq of device not passed through.



Thursday, September 25, 2014, 8:45:51 PM, you wrote:


> Thursday, September 25, 2014, 7:02:02 PM, you wrote:


>> Thursday, September 25, 2014, 6:14:43 PM, you wrote:

>>>>>> On 25.09.14 at 17:49, <linux@xxxxxxxxxxxxxx> wrote:

>>>> Thursday, September 25, 2014, 5:11:33 PM, you wrote:
>>>> 
>>>>>>>> On 25.09.14 at 16:36, <linux@xxxxxxxxxxxxxx> wrote:
>>>>>> - When shutting down the HVM guest when A happens the number of 
>>>>>> interrupts in 
>>>> 
>>>>>> /proc/interrups is still what it was, but when B happens it seems like a 
>>>>>> irq 
>>>> 
>>>>>> storm
>>>>>>   and after the irq nobody cared that ends with (always that 200000 so 
>>>>>> perhaps a threshold ?):
>>>>>>   16:     200000          0          0          0          0          0  
>>>> xen-pirq-ioapic-level  snd_hda_intel
>>>> 
>>>>> 100,000 is the traditional threshold, so I would expect the cited
>>>>> instance to be the second one. It didn't really become clear to me
>>>>> - is this observed in Dom0 or in the shutting down guest? 
>>>> 
>>>> This is from the /proc/interrupts of dom0 after the irq nobody cared 
>>>> message 
>>>> appeared in dom0 (so after B happened). Just after host boot and first 
>>>> guest 
>>>> boot it was stable around 500. On the next start (and after which B would 
>>>> happen on shutting the guest down again) it doubled to about 1000 (perhaps 
>>>> when 
>>>> the "Unsupported MSI delivery mode 3 for Dom2" occured).

>>> Something odd must then be going on - the threshold _is_ 100,000,
>>> not 200,000.

>>>>> And did you really check that no other device (even if currently not 
>>>>> having
>>>>> an interrupt handler bound) is sitting on IRQ 16?
>>>> 
>>>> In what way could i check that to be certain ?
>>>> (if it's not bound, lspci and /proc/interrupts will probably be 
>>>> insufficient 
>>>> for that ?)

>>> If the BIOS sets these up, lspci might still be of help. Consulting
>>> boot messages of the kernel may also provide some hints. Beyond
>>> that I'm not really sure how to figure out.

>>> Jan

>> lspci gives only one device with IRQ 16, the soundcontroller 

>> 00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia 
>> (Intel HDA) (rev 40)
>>         Subsystem: Micro-Star International Co., Ltd. Device 7640
>>         Flags: bus master, slow devsel, latency 64, IRQ 16
>>         Memory at fdbf8000 (64-bit, non-prefetchable) [size=16K]
>>         Capabilities: [50] Power Management version 2
>>         Kernel driver in use: snd_hda_intel

>> On boot i do get a message  "Already setup the GSI :16", however that seems 
>> to 
>> happen for multiple devices and irq/gsi's.

>> I did have a go at copy and pasting the (hopefully) most relevant messages 
>> around IRQ's and MSI's for the different stages. But my untrained eye 
>> doesn't 
>> spot a difference that i can relate.


>> ##Cold boot of the host system
>>     [   35.556728] xen: registering gsi 16 triggering 0 polarity 1
>>     [   35.573157] xen: --> pirq=16 -> irq=16 (gsi=16)
>>     (XEN) [2014-09-25 13:08:55.771] IOAPIC[0]: Set PCI routing entry (6-16 
>> -> 0x89 -> IRQ 16 Mode:1 Active:1)
>>     [   38.575661] pciback 0000:09:00.0: enabling device (0000 -> 0003)
>>     [   38.593584] xen: registering gsi 32 triggering 0 polarity 1
>>     [   38.610461] xen: --> pirq=32 -> irq=32 (gsi=32)
>>     (XEN) [2014-09-25 13:08:58.809] IOAPIC[1]: Set PCI routing entry (7-8 -> 
>> 0xc9 -> IRQ 32 Mode:1 Active:1)
>>     [   42.713230] xen: registering gsi 16 triggering 0 polarity 1
>>     [   42.713233] Already setup the GSI :16
>>     
>>     
>>     (XEN) [2014-09-25 13:29:04.111]    IRQ:  16 affinity:01 vec:89 
>> type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0: 16(---),
>>     (XEN) [2014-09-25 13:29:04.370]    IRQ:  32 affinity:3f vec:c9 
>> type=IO-APIC-level   status=00000002 mapped, unbound
>>     
>>     (XEN) [2014-09-25 13:29:06.583]     IRQ 16 Vec137:
>>     (XEN) [2014-09-25 13:29:06.597]       Apic 0x00, Pin 16: vec=89 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:1
>>     (XEN) [2014-09-25 13:29:06.978]     IRQ 32 Vec201:
>>     (XEN) [2014-09-25 13:29:06.991]       Apic 0x01, Pin  8: vec=00 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:63
>>     
>>     (XEN) [2014-09-25 13:29:19.108] 0000:09:00.0 - dom 0   - MSIs < >
>>     (XEN) [2014-09-25 13:29:19.442] 0000:00:14.2 - dom 0   - MSIs < >
>>     
>> ##Start of the HVM guest with pci device passed through (dom1).
>>     (XEN) [2014-09-25 13:30:32.831] io.c:280: d1: bind: m_gsi=32 g_gsi=36 
>> dev=00.00.5 intx=0
>>     
>>     (XEN) [2014-09-25 13:35:10.930]    IRQ:  16 affinity:01 vec:89 
>> type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0: 16(---),
>>     (XEN) [2014-09-25 13:35:11.189]    IRQ:  32 affinity:02 vec:c9 
>> type=IO-APIC-level   status=00000010 in-flight=0 domain-list=1: 32(-M-),
>>     (XEN) [2014-09-25 13:35:12.498]    IRQ:  84 affinity:04 vec:aa 
>> type=PCI-MSI         status=00000030 in-flight=0 domain-list=1: 87(---),
>>     
>>     (XEN) [2014-09-25 13:35:13.443]     IRQ 16 Vec137:
>>     (XEN) [2014-09-25 13:35:13.456]       Apic 0x00, Pin 16: vec=89 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:1
>>     (XEN) [2014-09-25 13:35:13.837]     IRQ 32 Vec201:
>>     (XEN) [2014-09-25 13:35:13.851]       Apic 0x01, Pin  8: vec=c9 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:2
>>     
>>     (XEN) [2014-09-25 13:35:28.164] 0000:09:00.0 - dom 1   - MSIs < 84 >
>>     (XEN) [2014-09-25 13:35:28.515] 0000:00:14.2 - dom 0   - MSIs < >
>>     
>>     (XEN) [2014-09-25 13:35:37.013]  MSI     84 vec=aa lowest  edge   assert 
>>  log lowest dest=00000004 mask=0/1/?
>>     
>> ##Shutdown of the HVM guest with pci device passed through, A happened.
>>     (XEN) [2014-09-25 13:38:27.974]    IRQ:  16 affinity:01 vec:89 
>> type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0: 16(---),
>>     (XEN) [2014-09-25 13:38:28.233]    IRQ:  32 affinity:02 vec:c9 
>> type=IO-APIC-level   status=00000002 mapped, unbound
>>     
>>     (XEN) [2014-09-25 13:38:30.446]     IRQ 16 Vec137:
>>     (XEN) [2014-09-25 13:38:30.459]       Apic 0x00, Pin 16: vec=89 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:1
>>     (XEN) [2014-09-25 13:38:30.840]     IRQ 32 Vec201:
>>     (XEN) [2014-09-25 13:38:30.854]       Apic 0x01, Pin  8: vec=c9 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:2
>>     
>>     (XEN) [2014-09-25 13:38:39.255] 0000:09:00.0 - dom 0   - MSIs < >
>>     (XEN) [2014-09-25 13:38:39.590] 0000:00:14.2 - dom 0   - MSIs < >  
>>     
>> ##Start of the HVM guest with pci device passed through (dom2).
>>     (XEN) [2014-09-25 13:39:07.963] io.c:280: d2: bind: m_gsi=32 g_gsi=36 
>> dev=00.00.5 intx=0
>>     (XEN) [2014-09-25 13:39:48.149] d32767v2: Unsupported MSI delivery mode 
>> 3 for Dom2
>>     
>>     (XEN) [2014-09-25 13:40:44.831]    IRQ:  16 affinity:01 vec:89 
>> type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0: 16(---),
>>     (XEN) [2014-09-25 13:40:45.089]    IRQ:  32 affinity:02 vec:c9 
>> type=IO-APIC-level   status=00000010 in-flight=0 domain-list=2: 32(-M-),
>>     (XEN) [2014-09-25 13:40:46.398]    IRQ:  84 affinity:02 vec:b2 
>> type=PCI-MSI         status=00000030 in-flight=0 domain-list=2: 87(---),
>>     
>>     (XEN) [2014-09-25 13:40:47.343]     IRQ 16 Vec137:
>>     (XEN) [2014-09-25 13:40:47.357]       Apic 0x00, Pin 16: vec=89 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:1
>>     (XEN) [2014-09-25 13:40:47.738]     IRQ 32 Vec201:
>>     (XEN) [2014-09-25 13:40:47.751]       Apic 0x01, Pin  8: vec=c9 
>> delivery=Fixed dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:2
>>     
>>     (XEN) [2014-09-25 13:40:57.567] 0000:09:00.0 - dom 2   - MSIs < 84 >
>>     (XEN) [2014-09-25 13:40:57.901] 0000:00:14.2 - dom 0   - MSIs < >
>>     
>>     (XEN) [2014-09-25 13:41:01.051]  MSI     84 vec=b2 lowest  edge   assert 
>>  log lowest dest=00000002 mask=0/1/?   
>>     
>> ##Shutdown of the HVM guest with pci device passed through, B happened.
>>     [ 2265.395971] irq 16: nobody cared (try booting with the "irqpoll" 
>> option)
>>     <call trace>
>>     [ 2266.234031] Disabling IRQ #16

>>     (XEN) [2014-09-25 13:46:54.844]    IRQ:  16 affinity:01 vec:89 
>> type=IO-APIC-level   status=00000030 in-flight=1 domain-list=0: 16(PMM),
>>     (XEN) [2014-09-25 13:46:55.103]    IRQ:  32 affinity:02 vec:c9 
>> type=IO-APIC-level   status=00000002 mapped, unbound
>>     
>>     (XEN) [2014-09-25 13:46:57.316]     IRQ 16 Vec137:
>>     (XEN) [2014-09-25 13:46:57.330]       Apic 0x00, Pin 16: vec=89 
>> delivery=Fixed dest=L status=0 polarity=1 irr=1 trig=L mask=0 dest_id:1
>>     (XEN) [2014-09-25 13:46:57.711]     IRQ 32 Vec201:
>>     (XEN) [2014-09-25 13:46:57.724]       Apic 0x01, Pin  8: vec=c9 
>> delivery=Fixed dest=L status=1 polarity=1 irr=0 trig=L mask=1 dest_id:2

>>     (XEN) [2014-09-25 13:47:08.688] 0000:09:00.0 - dom 0   - MSIs < >
>>     (XEN) [2014-09-25 13:47:09.022] 0000:00:14.2 - dom 0   - MSIs < >


> Hrmm there seems to be at least one omission in the debug-keys logging code, 
> a 
> delivery-mode other than "fixed or lowest" would never be shown:
> msi.c:1311
>                data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",

> However if it would be anything else than lowest, fixed would be shown.
> Since the debug keys output shows lowest, it should be correct .. 
> But how does it become delivery mode 3 during the guest start, at the moment 
> that vmsi_deliver() is called which give the "Unsupported MSI delivery mode 
> 3" message ?


> But this seems to be a red herring for the "irq 16: nobody cared" case 
> anyway, with 
> retesting i just had it occurring on the first boot of the HVM guest just 
> after 
> the host has booted, without showing the "Unsupported MSI delivery mode 3" 
> message.

> So it might be slightly related, but probably no causation .. *sigh*.

> --
> Sander

And another update:

- Tried booting dom0 with pci=nosmi, but that didn't make a difference, still 
  "irq16 nobody cared".
- Tried booting dom0 with pci=nomsi and irqpoll, that prevented the "irq16 
  nobody cared" from appearing, i could see around 600000 interrupts for irq16,
  however the machine now freezes shortly afterward without any error (on 
serial 
  console with sync-console on, triple ctrl-a also doesn't work anymore)
- Tried switching off the onboard soundcard in the bios. Now irq16 is not bound 
  to any device, but the machine still freezes without any error (on serial 
  console with sync-console on, triple ctrl-a also doesn't work anymore)

--
Sander



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.