[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)
Hello. The long story is this. I'm running CentOS 7 with custom built kernel. My architecture is x86_64. I'm trying to passthrough different GPUs to xen. I've got a problem with AMD FirePro W9100. Windows HVM guest starts with GPU and even some 3D benchmark is running OK. But after some time of working the domU and dom0 freeze. I monitor the serial console for kernel panics but I don't see them at all. I've decided to make a crash dump of the dom0 kernel to see what's going on. And it appears that I just cannot do this. I've tried specifying the crashkernel parameter both for the xen.gz and for my dom0 kernel (bzImage). 1. The first case: crashkernel=256M for dom0 cmdline: bzImage crashkernel=256M [root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service kdump.service - Crash recovery kernel arming ... ÐÐÑ 17 21:19:38 kvmxen-centos7-test1-nb kdumpctl[1506]: kexec: loaded kdump kernel ... [root@kvmxen-centos7-test1-nb ~]# cat /sys/kernel/kexec_crash_loaded 1 Here we see that kexec from kdump.service worked well. Seems like it has loaded the dump capture kernel. And now let's try to panic: [root@kvmxen-centos7-test1-nb ~]# echo c > /proc/sysrq-trigger In the console we see: [ Â421.673471] SysRq : Trigger a crash [ Â421.677110] BUG: unable to handle kernel NULL pointer dereference at      (null) [ Â421.685021] IP: [<ffffffff81484486>] sysrq_handle_crash+0x16/0x20 [ Â421.691172] PGD 2d11e58067 PUD 2c95d3c067 PMD 0 [ Â421.695900] Oops: 0002 [#1] SMP [ Â421.699210] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables sg rpcsec_gss_krb5 nls_utf8 iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul sb_edac glue_helper ablk_helper ipmi_si lpc_ich edac_core cryptd i2c_i801 pcspkr mfd_core ipmi_msghandler mei_me ioatdma wmi mei shpchp dca nfsd binfmt_misc mgag200 drm_kms_helper ttm drm ahci mlx4_core libahci libata [ Â421.745725] CPU: 9 PID: 11422 Comm: bash Not tainted 3.17.0 #3 [ Â421.751562] Hardware name: Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013 [ Â421.761910] task: ffff882e94383640 ti: ffff882c71758000 task.ti: ffff882c71758000 [ Â421.769398] RIP: e030:[<ffffffff81484486>] Â[<ffffffff81484486>] sysrq_handle_crash+0x16/0x20 [ Â421.777961] RSP: e02b:ffff882c7175be88 ÂEFLAGS: 00010246 [ Â421.783276] RAX: 000000000000000f RBX: ffffffff81d2d780 RCX: 0000000000000000 [ Â421.790416] RDX: 0000000000000000 RSI: ffff882eea52e5b8 RDI: 0000000000000063 [ Â421.797557] RBP: ffff882c7175be88 R08: 0000000000000002 R09: ffffffff82034afc [ Â421.804708] R10: 00000000000004a7 R11: 00000000000004a6 R12: 0000000000000063 [ Â421.811839] R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000 [ Â421.818992] FS: Â00007f1c0205b740(0000) GS:ffff882eea520000(0000) knlGS:0000000000000000 [ Â421.827075] CS: Âe033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ Â421.832821] CR2: 0000000000000000 CR3: 0000002c2a879000 CR4: 0000000000042660 [ Â421.839972] Stack: [ Â421.841998] Âffff882c7175beb8 ffffffff81484cd7 0000000000000002 00007f1c0207f000 [ Â421.849494] Â0000000000000002 ffff882c7175bf48 ffff882c7175bed0 ffffffff8148517f [ Â421.857019] Âffff882e94765380 ffff882c7175bef0 ffffffff81251afd ffff882c7175bf48 [ Â421.864514] Call Trace: [ Â421.866981] Â[<ffffffff81484cd7>] __handle_sysrq+0x107/0x170 [ Â421.872645] Â[<ffffffff8148517f>] write_sysrq_trigger+0x2f/0x40 [ Â421.878575] Â[<ffffffff81251afd>] proc_reg_write+0x3d/0x80 [ Â421.884069] Â[<ffffffff811eaef7>] vfs_write+0xb7/0x1f0 [ Â421.889209] Â[<ffffffff811ebb15>] SyS_write+0x55/0xd0 [ Â421.894294] Â[<ffffffff8183fc29>] system_call_fastpath+0x16/0x1b [ Â421.900300] Code: 65 34 75 e5 4c 89 ef e8 d9 f7 ff ff eb db 0f 1f 80 00 00 00 00 66 66 66 66 90 55 c7 05 88 43 7f 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2e [ Â421.920596] RIP Â[<ffffffff81484486>] sysrq_handle_crash+0x16/0x20 [ Â421.926803] ÂRSP <ffff882c7175be88> [ Â421.930302] CR2: 0000000000000000 And that's it. The dump capture kernel is not loaded. After this kernel panic my server just reboot. 2. The second case: crashkernel=256M in xen.gz cmdline. xen.gz crashkernel=256M [root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service kdump.service - Crash recovery kernel arming ...  ÂActive: failed (Result: exit-code) since ÐÑ 2014-10-17 19:56:57 MSK; 1h 9min ago ... ÐÐÑ 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: No memory reserved for crash kernel. ÐÐÑ 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: Starting kdump: [FAILED] .... As we see the kdump.service cannot load the dump capture kernel because 'No memory reserved for crash kernel'. So the questions are: 1. How can I make crash dumps of the hypervisor and the dom0? 2. How am I supposed to diagnose the thing that causes such dom0 freezes? I thought that if I ask on the list that my dom0 freezes, it will be a waste of time without any logs or crash dumps.. But I cannot even make them.. I really want to contribute by testing xen and submitting bugs but I'd like to do it with more material for the developers. Thank you, Grigory. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |