[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kernel warning on driver-domain (Worse, boot-failure) using kernel 5.10.62)



On Sunday, September 5, 2021 2:32:20 PM CEST Juergen Gross wrote:
> On 04.09.21 08:45, J. Roeleveld wrote:
> > Hi,
> > 
> > I am getting semi-regular warnings in the dmesg about block devices shared
> > to other Domains from a driver-domain.
> > 
> > The dmesg/kernel trace is at the bottom of the email.
> > All the traces show the same "call trace" and most of the numbers looks
> > similar. It's always a different domain and xvda-device that is listed.
> > 
> > Versions (Dom0 and driver domain are kept at same versions)
> > Kernel: 5.10.52
> > Xen: 4.14.2
> 
> You are missing kernel patch 3a7956e25e1d7b3c148569e78895e1f3178122a9
> which is in 5.10.62.

Thank you for this.
I just tried booting with 5.10.62, but it fails with the below error. I found 
a similar issue (not fully, as reported is for e1000e NIC, and I'm using 
mpt3sas) at:
https://lkml.org/lkml/2021/8/26/500

But, I'm not sure if this is actually the issue I am encountering. Currently 
back on 5.10.52 as that seems stable for now.

For this particular driver domain, I pass 2 HBA's (LSI SAS3008) to it using 
PCI-passthrough. The relevant lines in the config:
pci = ['81:00.0,permissive=1','82:00.0,permissive=1']

I added the "permissive" in the past due to warnings/errors (can't fully 
remember which) in the past as per:
https://wiki.xenproject.org/wiki/Xen_PCI_Passthrough#PV_guests_and_PCI_quirks

If anyone has any ideas/suggestions?

====
BUG: unable to handle page fault for address: ffffc9000c62900c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 1793aa4067 P4D 1793aa4067 PUD 1793aa3067 PMD 1000e1067 PTE 
80100000fbe4e075
Oops: 0003 [#1] SMP NOPTI
CPU: 3 PID: 713 Comm: udevd Tainted: P           O      5.10.62-gentoo-generic 
#1
RIP: e030:__pci_enable_msix_range+0x104/0x477
Code: 01 89 d6 89 54 24 08 c1 e6 04 e8 e4 9e bd ff 48 85 c0 49 89 c7 0f 84 2c 
03 00 00 8b 54 24 08 48 8d 48 0c be 01 00 00 00 31 c0 <89> 31 ff c0 48 83 c1 10 
39 c2 7f f4 48 89 ef e8 4d f8 ff ff 4d 85
RSP: e02b:ffffc9000dd3bab0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000c62900c
RDX: 0000000000000060 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff888100b89000 R08: 0000000000000000 R09: 00000000fbe4e5ff
R10: 00000000000fbe4e R11: 00000000000fbe4e R12: 0000000000000004
R13: 0000000000000000 R14: ffffc9000dd3bbd0 R15: ffffc9000c629000
FS:  00007fcf52a2c740(0000) GS:ffff889886f80000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8000007df270 CR3: 00000001040c6000 CR4: 0000000000050660
Call Trace:
 pci_alloc_irq_vectors_affinity+0x6f/0xe8
 mpt3sas_base_map_resources+0x4e9/0x7c2 [mpt3sas]
 mpt3sas_base_attach+0x113/0x17d3 [mpt3sas]
 _scsih_probe+0x753/0x850 [mpt3sas]
 pci_device_probe+0xc6/0x135
 really_probe+0x144/0x326
 driver_probe_device+0x63/0x92
 device_driver_attach+0x37/0x50
 __driver_attach+0x92/0x9a
 ? device_driver_attach+0x50/0x50
 bus_for_each_dev+0x6e/0xa4
 bus_add_driver+0x103/0x1b4
 driver_register+0x99/0xd2
 ? 0xffffffffa01fb000
 _mpt3sas_init+0x1a7/0x1000 [mpt3sas]
 do_one_initcall+0x72/0x16c
 ? kmem_cache_alloc_trace+0xdb/0x102
 do_init_module+0x56/0x1f3
 __do_sys_finit_module+0x94/0xbb
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fcf52b878d9
Code: 48 8d 3d da 60 0c 00 0f 05 eb a4 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 8b 0d 77 25 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffef026c478 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055f2b189e260 RCX: 00007fcf52b878d9
RDX: 0000000000000000 RSI: 000055f2b188b720 RDI: 0000000000000009
RBP: 0000000000020000 R08: 0000000000000000 R09: 000055f2b18a0140
R10: 0000000000000009 R11: 0000000000000246 R12: 000055f2b188b720
R13: 0000000000000000 R14: 000055f2b18a1320 R15: 0000000000000000
Modules linked in: zfs(PO+) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) 
znvpair(PO) zavl(PO) icp(PO) spl(O) crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel mpt3sas(+) scsi_transport_sas aesni_intel
CR2: ffffc9000c62900c
---[ end trace 96215648c76c40ac ]---
RIP: e030:__pci_enable_msix_range+0x104/0x477
Code: 01 89 d6 89 54 24 08 c1 e6 04 e8 e4 9e bd ff 48 85 c0 49 89 c7 0f 84 2c 
03 00 00 8b 54 24 08 48 8d 48 0c be 01 00 00 00 31 c0 <89> 31 ff c0 48 83 c1 10 
39 c2 7f f4 48 89 ef e8 4d f8 ff ff 4d 85
RSP: e02b:ffffc9000dd3bab0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000c62900c
RDX: 0000000000000060 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff888100b89000 R08: 0000000000000000 R09: 00000000fbe4e5ff
R10: 00000000000fbe4e R11: 00000000000fbe4e R12: 0000000000000004
R13: 0000000000000000 R14: ffffc9000dd3bbd0 R15: ffffc9000c629000
FS:  00007fcf52a2c740(0000) GS:ffff889886f80000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8000007df270 CR3: 00000001040c6000 CR4: 0000000000050660
udevd[679]: worker [713] terminated by signal 9 (Killed)
udevd[679]: worker [713] failed while handling '/devices/pci-0/
pci0000:00/0000:00:00.0'
====





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.