[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: pci-passthrough "irq 16: nobody cared" on HVM guest shutdown on irq of device not passed through.



Thursday, September 25, 2014, 4:42:24 PM, you wrote:

> On 25/09/14 15:36, Sander Eikelenboom wrote:
>> Hi Jan / Konrad,
>>
>> I mentioned before seeing this sometimes, but since it happened infrequently 
>> it was hard to describe the case and log everything.
>> Somehow it seems i can trigger it quite reliably at the moment, so here a 
>> extensive report.
>>
>> When shutting down a HVM guest with pci passthrough (in this case a VGA 
>> adapter),
>>  i *sometimes* run into this:
>>
>> [ 2265.395971] irq 16: nobody cared (try booting with the "irqpoll" option)
>> [ 2265.422948] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
>> 3.17.0-rc6-20140925-vanilla+ #1
>> [ 2265.453314] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [ 2265.484046]  ffff880057a1a290 ffff88005f603d88 ffffffff81b7d90e 
>> 0000000000000001
>> [ 2265.513053]  ffff880057a1a200 ffff88005f603db8 ffffffff8110d6c8 
>> ffff88005f603db8
>> [ 2265.542121]  ffff880057a1a200 0000000000000010 0000000000000000 
>> ffff88005f603e08
>> [ 2265.571135] Call Trace:
>> [ 2265.585507]  <IRQ>  [<ffffffff81b7d90e>] dump_stack+0x46/0x58
>> [ 2265.609694]  [<ffffffff8110d6c8>] __report_bad_irq+0x38/0xd0
>> [ 2265.633625]  [<ffffffff8110dc1a>] note_interrupt+0x23a/0x290
>> [ 2265.657572]  [<ffffffff8155f0f5>] ? add_interrupt_randomness+0x45/0x210
>> [ 2265.684405]  [<ffffffff8110b45d>] handle_irq_event_percpu+0x9d/0x150
>> [ 2265.710379]  [<ffffffff8110b553>] handle_irq_event+0x43/0x70
>> [ 2265.734213]  [<ffffffff8110e29a>] ? handle_fasteoi_irq+0x2a/0x150
>> [ 2265.759463]  [<ffffffff8110e2f7>] handle_fasteoi_irq+0x87/0x150
>> [ 2265.784122]  [<ffffffff8110acbd>] generic_handle_irq+0x1d/0x40
>> [ 2265.808338]  [<ffffffff8152037a>] evtchn_fifo_handle_events+0x16a/0x170
>> [ 2265.834898]  [<ffffffff8151d4c8>] __xen_evtchn_do_upcall+0x48/0x90
>> [ 2265.860241]  [<ffffffff8151f0d2>] xen_evtchn_do_upcall+0x32/0x50
>> [ 2265.885031]  [<ffffffff81b8a76e>] xen_do_hypervisor_callback+0x1e/0x30
>> [ 2265.911279]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [ 2265.938509]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [ 2265.963981]  [<ffffffff81008d80>] ? xen_safe_halt+0x10/0x20
>> [ 2265.987198]  [<ffffffff81018bd8>] ? default_idle+0x18/0x20
>> [ 2266.010032]  [<ffffffff8101949a>] ? arch_cpu_idle+0xa/0x10
>> [ 2266.032827]  [<ffffffff810f84f1>] ? cpu_startup_entry+0x281/0x2f0
>> [ 2266.057481]  [<ffffffff81b741e4>] ? rest_init+0xb4/0xc0
>> [ 2266.079672]  [<ffffffff81b74130>] ? csum_partial_copy_generic+0x170/0x170
>> [ 2266.106401]  [<ffffffff82321079>] ? start_kernel+0x43f/0x44c
>> [ 2266.129479]  [<ffffffff82320a27>] ? set_init_arg+0x58/0x58
>> [ 2266.151971]  [<ffffffff82320608>] ? x86_64_start_reservations+0x2a/0x2c
>> [ 2266.177879]  [<ffffffff823240af>] ? xen_start_kernel+0x59b/0x59d
>> [ 2266.201994] handlers:
>> [ 2266.214783] [<ffffffff81945580>] azx_interrupt
>> [ 2266.234031] Disabling IRQ #16
>>
>> The system:
>>
>> - AMD
>> - Xen-unstable xen_changeset: Wed Sep 24 11:19:57 2014 +0200 
>> git:b67a26f-dirty
>> - Both dom0 and domU (HVM guest using qemu-xen) run a 3.17-rc6 kernel
>> - The device passed through is 09:00.0
>>
>> - This IRQ is *not* coupled to the passthrough device (09:00.0), but to the 
>> onboard 
>>   soundcard (00:14.2 on the southbridge) and is in dom0 and not in active 
>> use (although the 
>>   snd-hda-intel driver is loaded).
>>
>> - No "soundhw" option is specified in the guest config, so it also shouldn't 
>> be 
>>   trying to use it that way.
>>
>>
>>
>> There are 2 things that can happen when trying to start and shutdown a guest:
>> A) It starts and shutdowns OK, (no irq nobody cared messages)
>> B) It starts fine and but after shutdown the nirq nobody cared message
>>
>> - B *can* happen both on: the first start-and-shutdown of the HVM guest, or 
>> only on a subsequent start-and-shutdown
>>   (so on the first start-and-shutdown it can work ok, but does not always)
>>
>> There seems to be some small differences for both cases from the start of 
>> the domain:
>>
>> - When booting the HVM guest the irq number of /proc/interrupts stays the 
>> same for when A happens, but when B happens, the number of interrupts has 
>> been
>>   doubled (so that seems like a reinit of the device that is not passed 
>> through).
>>
>> - When shutting down the HVM guest when A happens the number of interrupts 
>> in /proc/interrups is still what it was, but when B happens it seems like a 
>> irq storm
>>   and after the irq nobody cared that ends with (always that 200000 so 
>> perhaps a threshold ?):
>>   16:     200000          0          0          0          0          0  
>> xen-pirq-ioapic-level  snd_hda_intel
>>
>> - On the start when B happens, xl dmesg contains this message (when A 
>> happens it doesn't contain it):
>>   (XEN) [2014-09-25 13:39:48.149] d32767v2: Unsupported MSI delivery mode 3 
>> for Dom2
>>
>>   If i interpret that right in the logging the d32767 seems to be used for 
>> the IOMMU.
>>
>> I attached the complete serial log while doing this (hope it's not too large 
>> for the mailing list):
>>
>> - Cold boot of the host system
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Start of the HVM guest with pci device passed through.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Shutdown of the HVM guest with pci device passed through, A happened.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Start of the HVM guest with pci device passed through.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Shutdown of the HVM guest with pci device passed through, B happened.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>>
>> I also attached the output of lspci -vvvknn

> Could you provide `lspci -tv` as well please?

Sure:
~# lspci -tv
-[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 Northbridge only 
single slot PCI-e GFX Hydra part
           +-00.2  Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory 
Management Unit (IOMMU)
           +-02.0-[0f]--+-00.0  Advanced Micro Devices [AMD] nee ATI RV620 LE 
[Radeon HD 3450]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI RV620 HDMI 
Audio [Radeon HD 3400 Series]
           +-03.0-[0e]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks 
[Radeon HD 6570]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI 
Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
           +-05.0-[0d]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B 
PCI Express Gigabit Ethernet controller
           +-06.0-[0c]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B 
PCI Express Gigabit Ethernet controller
           +-09.0-[0b]----00.0  NEC Corporation uPD720200 USB 3.0 Host 
Controller
           +-0a.0-[0a]----00.0  Conexant Systems, Inc. Device 8210
           +-0b.0-[09]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks 
[Radeon HD 6570]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI 
Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
           +-0c.0-[05-08]----00.0-[06-08]--+-01.0-[08]----00.0  NEC Corporation 
uPD720200 USB 3.0 Host Controller
           |                               \-02.0-[07]----00.0  Marvell 
Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller
           +-0d.0-[04]----00.0  NEC Corporation uPD720200 USB 3.0 Host 
Controller
           +-11.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA 
Controller [AHCI mode]
           +-12.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
           +-12.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
           +-13.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
           +-13.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
           +-14.0  Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller
           +-14.2  Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
           +-14.3  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC 
host controller
           +-14.4-[03]----06.0  C-Media Electronics Inc CMI8738/CMI8768 PCI 
Audio
           +-14.5  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI2 Controller
           +-15.0-[02]--
           +-16.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
           +-16.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
           +-18.0  Advanced Micro Devices [AMD] Family 10h Processor 
HyperTransport Configuration
           +-18.1  Advanced Micro Devices [AMD] Family 10h Processor Address Map
           +-18.2  Advanced Micro Devices [AMD] Family 10h Processor DRAM 
Controller
           +-18.3  Advanced Micro Devices [AMD] Family 10h Processor 
Miscellaneous Control
           \-18.4  Advanced Micro Devices [AMD] Family 10h Processor Link 
Control


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.