[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] IVHD error, AMD-Vi gets disabled (4.1.3-3ubuntu1.3)



Hello,

I have a system running ubuntu12.10 with xen 4.1.3, which I believe was recently upgraded from xen 4.1.2 via the normal ubuntu 'apt-get upgrade' path. 2 AMD opteron 6134 cpus, supermicro H8DG6 motherboard.

Recently I noticed that AMD-Vi gets disabled at boot, which didn't happen before.

From 'xm dmesg':

(XEN) Command line: placeholder amd-iommu-debug iommu=verbose iommu=1 apic=debug iommu_inclusive_mapping=1
[...]
(XEN) IVHD Error: Conflicting IO-APIC 0x0 entries
(XEN) AMD-Vi: Error initialization
(XEN) I/O virtualisation disabled

I noticed it when getting an error upon trying to passthrough a PCI device. A month ago PCI passthru worked, but I haven't tested in between so I cannot say exactly when the problem arose, if it has to do with the transit from 4.1.2 to 4.1.3 or not. IOMMU is enabled in BIOS, of course.

Could it be a xen bug (I know that my system is somewhat unusual), a old BIOS bug interacting with newer xen or an aqcuired hardware error? I'll of course provide more logs if needed.

Thanks,
Andreas




ADDENDUM:
Some additional debugging I did (maybe helpful, maybe not):

If I boot a baremetal linux3.5 kernel, IOMMU seems to enable as it should, but I get an "AMD-Vi: Completion-Wait loop timed out" message in dmesg followed by a soft CPU lockup:

[    1.152261] AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
[    1.226282] AMD-Vi: Enabling IOMMU at 0000:40:00.2 cap 0x40
[    1.333560] AMD-Vi: Completion-Wait loop timed out
[    1.440835] AMD-Vi: Completion-Wait loop timed out

(last line repeated x 100ish)

[   28.425464] BUG: soft lockup - CPU#11 stuck for 22s! [swapper/0:1]
[   28.425467] Modules linked in:
[   28.425471] CPU 11
[   28.425472] Modules linked in:
[   28.425476]
[   28.425479] Pid: 1, comm: swapper/0 Not tainted 3.5.0-23-generic #35-Ubuntu Supermicro H8DG6/H8DGi/H8DG6/H8DGi
[   28.425486] RIP: 0010:[<ffffffff8101aff6>]  [<ffffffff8101aff6>] native_read_tsc+0x6/0x20
[   28.425497] RSP: 0000:ffff880234855d10  EFLAGS: 00000246
[   28.425501] RAX: 00000000226cded8 RBX: 00000000ffffffff RCX: 000000000118c4ae
[   28.425504] RDX: 0000000000000025 RSI: 0000000000000286 RDI: 00000000000008ea
[   28.425507] RBP: ffff880234855d10 R08: 000000000000040a R09: 0000000000001ff0
[   28.425511] R10: 0720072007200720 R11: 0720072007200720 R12: ffffffff81e49c23
[   28.425514] R13: 0000000000000246 R14: ffff880234855cf0 R15: 0000000000000006
[   28.425517] FS:  0000000000000000(0000) GS:ffff88083fcc0000(0000) knlGS:0000000000000000
[   28.425521] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.425524] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000007e0
[   28.425527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   28.425531] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   28.425534] Process swapper/0 (pid: 1, threadinfo ffff880234854000, task ffff880234858000)
[   28.425538] Stack:
[   28.425540]  ffff880234855d40 ffffffff8133599a 0000000000010ef6 ffff880234855d98
[   28.425547]  0000000000000286 0000000000002000 ffff880234855d50 ffffffff813358cc
[   28.425553]  ffff880234855d70 ffffffff8154a7c5 ffff880434e4c800 ffff880434e4c814
[   28.425559] Call Trace:
[   28.425566]  [<ffffffff8133599a>] delay_tsc+0x4a/0x80
[   28.425570]  [<ffffffff813358cc>] __const_udelay+0x2c/0x30
[   28.425577]  [<ffffffff8154a7c5>] wait_on_sem+0x35/0x70
[   28.425582]  [<ffffffff8154b493>] iommu_queue_command_sync+0x83/0x140
[   28.425587]  [<ffffffff8154b563>] iommu_queue_command+0x13/0x20
[   28.425592]  [<ffffffff8154dd37>] iommu_flush_all_caches+0xd7/0x100
[   28.425597]  [<ffffffff8154ed90>] enable_iommus+0x250/0x360
[   28.425605]  [<ffffffff81cfc17e>] ? memblock_find_dma_reserve+0x13d/0x13d
[   28.425611]  [<ffffffff81d328ce>] amd_iommu_init_hardware+0x1f7/0x231
[   28.425616]  [<ffffffff81d32913>] amd_iommu_init+0xb/0x8b
[   28.425620]  [<ffffffff81cfc191>] pci_iommu_init+0x13/0x3e
[   28.425627]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[   28.425631]  [<ffffffff81cf3d3a>] kernel_init+0x140/0x1c9
[   28.425635]  [<ffffffff81cf3588>] ? loglevel+0x31/0x31
[   28.425642]  [<ffffffff8168ce64>] kernel_thread_helper+0x4/0x10
[   28.425646]  [<ffffffff81cf3bfa>] ? start_kernel+0x3d2/0x3d2
[   28.425650]  [<ffffffff8168ce60>] ? gs_change+0x13/0x13
[   28.425652] Code: c3 0f 1f 40 00 55 89 f8 48 89 e5 e6 70 e4 71 5d c3 0f 1f 40 00 55 89 f0 4
8 89 e5 e6 70 89 f8 e6 71 5d c3 66 90 55 48 89 e5 0f 31 <89> c0 48 c1 e2 20 48 09 c2 48 89 d0
5d c3 66 66 66 2e 0f 1f 84
[   28.458326] AMD-Vi: Completion-Wait loop timed out
[   28.565522] AMD-Vi: Completion-Wait loop timed out
[   28.672707] AMD-Vi: Completion-Wait loop timed out
[   28.779909] AMD-Vi: Completion-Wait loop timed out
[   28.886441] AMD-Vi: Completion-Wait loop timed out
[   28.886491] pci 0000:00:00.2: irq 40 for MSI/MSI-X
[   28.886676] pci 0000:40:00.2: irq 41 for MSI/MSI-X
[   28.896427] AMD-Vi: Lazy IO/TLB flushing enabled

(boot continues as normal)

This may point towards either hardware or bios. But I do not know if this would have happened before I started getting problems with xen.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.