Xen project Mailing List

Hello.

The long story is this. I'm running CentOS 7 with custom built kernel.

My architecture is x86_64. I'm trying to passthrough different GPUs to xen.

I've got a problem with AMD FirePro W9100. Windows HVM guest starts with GPU

and even some 3D benchmark is running OK. But after some time of working the

domU and dom0 freeze.

I monitor the serial console for kernel panics but I don't see them at all.

I've decided to make a crash dump of the dom0 kernel to see what's going on.

And it appears that I just cannot do this.

I've tried specifying the crashkernel parameter both for the xen.gz and for

my dom0 kernel (bzImage).

1. The first case: crashkernel=256M for dom0 cmdline:

bzImage crashkernel=256M

[root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service

kdump.service - Crash recovery kernel arming

...

ÐÐÑ 17 21:19:38 kvmxen-centos7-test1-nb kdumpctl[1506]: kexec: loaded kdump kernel

...

[root@kvmxen-centos7-test1-nb ~]# cat /sys/kernel/kexec_crash_loadedÂ

Here we see that kexec from kdump.service worked well. Seems like it has

loaded the dump capture kernel.

And now let's try to panic:

[root@kvmxen-centos7-test1-nb ~]# echo c > /proc/sysrq-trigger

In the console we see:

[ Â421.673471] SysRq : Trigger a crash

[ Â421.677110] BUG: unable to handle kernel NULL pointer dereference at Â Â Â Â Â (null)

[ Â421.685021] IP: [<ffffffff81484486>] sysrq_handle_crash+0x16/0x20

[ Â421.691172] PGD 2d11e58067 PUD 2c95d3c067 PMD 0Â

[ Â421.695900] Oops: 0002 [#1] SMPÂ

[ Â421.699210] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables sg rpcsec_gss_krb5 nls_utf8 iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul sb_edac glue_helper ablk_helper ipmi_si lpc_ich edac_core cryptd i2c_i801 pcspkr mfd_core ipmi_msghandler mei_me ioatdma wmi mei shpchp dca nfsd binfmt_misc mgag200 drm_kms_helper ttm drm ahci mlx4_core libahci libata

[ Â421.745725] CPU: 9 PID: 11422 Comm: bash Not tainted 3.17.0 #3

[ Â421.751562] Hardware name: Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013

[ Â421.761910] task: ffff882e94383640 ti: ffff882c71758000 task.ti: ffff882c71758000

[ Â421.769398] RIP: e030:[<ffffffff81484486>] Â[<ffffffff81484486>] sysrq_handle_crash+0x16/0x20

[ Â421.777961] RSP: e02b:ffff882c7175be88 ÂEFLAGS: 00010246

[ Â421.783276] RAX: 000000000000000f RBX: ffffffff81d2d780 RCX: 0000000000000000

[ Â421.790416] RDX: 0000000000000000 RSI: ffff882eea52e5b8 RDI: 0000000000000063

[ Â421.797557] RBP: ffff882c7175be88 R08: 0000000000000002 R09: ffffffff82034afc

[ Â421.804708] R10: 00000000000004a7 R11: 00000000000004a6 R12: 0000000000000063

[ Â421.811839] R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000

[ Â421.818992] FS: Â00007f1c0205b740(0000) GS:ffff882eea520000(0000) knlGS:0000000000000000

[ Â421.827075] CS: Âe033 DS: 0000 ES: 0000 CR0: 0000000080050033

[ Â421.832821] CR2: 0000000000000000 CR3: 0000002c2a879000 CR4: 0000000000042660

[ Â421.839972] Stack:

[ Â421.841998] Âffff882c7175beb8 ffffffff81484cd7 0000000000000002 00007f1c0207f000

[ Â421.849494] Â0000000000000002 ffff882c7175bf48 ffff882c7175bed0 ffffffff8148517f

[ Â421.857019] Âffff882e94765380 ffff882c7175bef0 ffffffff81251afd ffff882c7175bf48

[ Â421.864514] Call Trace:

[ Â421.866981] Â[<ffffffff81484cd7>] __handle_sysrq+0x107/0x170

[ Â421.872645] Â[<ffffffff8148517f>] write_sysrq_trigger+0x2f/0x40

[ Â421.878575] Â[<ffffffff81251afd>] proc_reg_write+0x3d/0x80

[ Â421.884069] Â[<ffffffff811eaef7>] vfs_write+0xb7/0x1f0

[ Â421.889209] Â[<ffffffff811ebb15>] SyS_write+0x55/0xd0

[ Â421.894294] Â[<ffffffff8183fc29>] system_call_fastpath+0x16/0x1b

[ Â421.900300] Code: 65 34 75 e5 4c 89 ef e8 d9 f7 ff ff eb db 0f 1f 80 00 00 00 00 66 66 66 66 90 55 c7 05 88 43 7f 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2eÂ

[ Â421.920596] RIP Â[<ffffffff81484486>] sysrq_handle_crash+0x16/0x20

[ Â421.926803] ÂRSP <ffff882c7175be88>

[ Â421.930302] CR2: 0000000000000000

And that's it. The dump capture kernel is not loaded. After this kernel panic

my server just reboot.

2. The second case: crashkernel=256M in xen.gz cmdline.

xen.gz crashkernel=256M

[root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.serviceÂ

kdump.service - Crash recovery kernel arming

...

Â ÂActive: failed (Result: exit-code) since ÐÑ 2014-10-17 19:56:57 MSK; 1h 9min ago

...

ÐÐÑ 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: No memory reserved for crash kernel.

ÐÐÑ 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: Starting kdump: [FAILED]

....

As we see the kdump.service cannot load the dump capture kernel because

'No memory reserved for crash kernel'.

So the questions are:

1. How can I make crash dumps of the hypervisor and the dom0?

2. How am I supposed to diagnose the thing that causes such dom0 freezes?

I thought that if I ask on the list that my dom0 freezes, it will be a waste

of time without any logs or crash dumps.. But I cannot even make them..

I really want to contribute by testing xen and submitting bugs but I'd like

to do it with more material for the developers.

Thank you,

Grigory.

Best regards,
Grigory Ptashko

+7 (916) 1489766
grigory.ptashko@xxxxxxxxx

skype grigory_ptashko

linkedin.com/in/gptashko

facebook.com/GrigoryPtashko

[Xen-devel] kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)