[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [BUG] kernel panic when dom_mem is set to 2048M
Hi xen-devel I am experiencing an odd issue with Xen crashing (Kernel Panic) in one of my Xen 4.6 nodes on Debian. However, this only happens when dom0_mem is set to 2048M, below that i.e.: 1024M or above that i.e: 4096M the system is stable. See below a kernel panic log, I have attached also additional logs and information about this including some uprecords. The server had the PSU replaced and also hardware was tested for RAM, Disks or CPU issues.Hardware appears to be fine. Hardware: --------- Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz Supermicro Super Server/X10SRi-F, BIOS 1.0b Software RAID. More details on the lspci log attached. OS/System details: ------------------ Debian GNU/Linux stretch/sid (4.7.0-1-amd64 #1 SMP Debian 4.7.6-1 (2016-10-07) x86_64 GNU/Linux) Using Ganeti: gnt-cluster (ganeti 2.15.2-6) 2.15.2 Xen 4.6, installed using Debian packages. Xen and linux command line: ----------------------------- (Node 1) unstable host when dom_mem is set to 2048M (XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc (Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb 9 17:46:27 UTC 2016 (XEN) Bootloader: GRUB 2.02~beta2-36 (XEN) Command line: placeholder dom0_mem=2048M,max:2048M noreboot dom0_max_vcpus=1 com1=115200,8n1 console=com2 no-real-mode edd=off [*] Note that "loglvl=all guest_loglvl=all" has been added for verbose output on the attached log files. [*] no-real-mode and edd=off come from Debian, see also: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750958 but not a problem here. linux /proc/cmdline placeholder root=UUID=525998f8-b79b-42a5-824f-49862bccd14f ro nomodeset noreboot xencons=hvc console=hvc0 About the extra kernel args: dom0_max_vcpus=1 no-real-mode edd=off are having no effect on this issue, I have done some tests with and without those arguments. Furthermore, I have another node that has memory set to 2048 and it is not experiencing that problem, similar hardware and same OS pretty much. (Node 2) Here is the command line for Xen on this other host that seems stable so far: (XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc (Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb 9 17:46:27 UTC 2016 (XEN) Bootloader: GRUB 2.02~beta2-36 (XEN) Command line: placeholder dom0_mem=2048M,max:2048M no-real-mode edd=off linux /proc/cmdline placeholder root=UUID=a13f424f-11b2-4195-b82e-bda616ce6a6f ro nomodeset //---- kernel panic start ----// [ 1571.770465] BUG: unable to handle kernel paging request at 00000000000e90f0 [ 1571.770493] IP: [<ffffffff810c28dc>] __pv_queued_spin_lock_slowpath+0x18c/0x260 [ 1571.770501] PGD 6b389067 PUD 6b346067 PMD 0 [ 1571.770504] Oops: 0002 [#1] SMP [ 1571.770508] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback tun xen_blkback bridge stp llc xen_gntdev xen_evtchn xenfs xen_privcmd nls_ascii nls_cp437 vfat fat evdev iTCO_wdt iTCO_vendor_support intel_rapl sb_edac edac_core sg x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul t tm ghash_clmulni_intel drm_kms_helper drm lpc_ich mei_me i2c_i801 pcspkr mei mfd_core shpchp ioatdma drbd lru_cache sunrpc ip_tables x_tables autofs4 ext4 ecb crc16 jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear xen_blkfront dm_m od raid1 md_mod sd_mod crc32c_intel xhci_pci aesni_intel ahci libahci aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_hcd ehci_pci ehci_hcd libata usbcor e usb_common scsi_mod igb i2c_algo_bit dca ptp pps_core [ 1571.770552] CPU: 0 PID: 863 Comm: ganeti-mond Tainted: G W 4.7.0-1-amd64 #1 Debian 4.7.6-1 [ 1571.770554] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 [ 1571.770556] task: ffff88006a8c4000 ti: ffff88006abb4000 task.ti: ffff88006abb4000 [ 1571.770558] RIP: e030:[<ffffffff810c28dc>] [<ffffffff810c28dc>] __pv_queued_spin_lock_slowpath+0x18c/0x260 [ 1571.770561] RSP: e02b:ffff88006abb7d00 EFLAGS: 00010206 [ 1571.770563] RAX: 0000000000003ffe RBX: ffffc900403a910c RCX: 0000000000000002 [ 1571.770565] RDX: 0000000000000000 RSI: 00000000ffff8800 RDI: ffffc900403a910c [ 1571.770566] RBP: ffff8801068178c0 R08: ffff88007616f040 R09: ffffffff81547200 [ 1571.770568] R10: 000000001a99c9f3 R11: ffffc900403a2dfc R12: 00000000000e90f0 [ 1571.770570] R13: ffff880106817904 R14: 0000000000040000 R15: 0000000000000001 [ 1571.770613] FS: 00007f3f8d01cf00(0000) GS:ffff880106800000(0000) knlGS:0000000000000000 [ 1571.770615] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1571.770617] CR2: 00000000000e90f0 CR3: 000000006aaff000 CR4: 0000000000042660 [ 1571.770619] Stack: [ 1571.770620] ffffffff815461bc ffffc900403a9100 ffffc900403a910c 00000000e899ac00 [ 1571.770623] ffffc900403a2df0 ffff88007616f040 ffffffff81b11550 ffffffff815db00d [ 1571.770625] ffffffff815477d9 00007ffe0000000a ffffc900403a2dfc ffffffff81547200 [ 1571.770628] Call Trace: [ 1571.770633] [<ffffffff815461bc>] ? udp_lib_lport_inuse+0x2c/0xf0 [ 1571.770639] [<ffffffff815db00d>] ? _raw_spin_lock+0x1d/0x20 [ 1571.770641] [<ffffffff815477d9>] ? udp_lib_get_port+0x3d9/0x5a0 [ 1571.770644] [<ffffffff81547200>] ? udp4_seq_show+0x160/0x160 [ 1571.770647] [<ffffffff81552fa6>] ? inet_autobind+0x26/0x60 [ 1571.770649] [<ffffffff81554daa>] ? inet_sendmsg+0x7a/0xb0 [ 1571.770653] [<ffffffff814be8a0>] ? sock_sendmsg+0x30/0x40 [ 1571.770655] [<ffffffff814bee03>] ? SYSC_sendto+0xd3/0x150 [ 1571.770658] [<ffffffff8120cfa9>] ? SyS_select+0xc9/0x110 [ 1571.770661] [<ffffffff815db136>] ? system_call_fast_compare_end+0xc/0x96 [ 1571.770662] Code: 89 c4 c1 e8 12 4c 8d 6d 44 49 c1 ec 0c 83 e8 01 41 bf 01 00 00 00 41 83 e4 30 48 98 49 81 c4 c0 78 01 00 4c 03 24 c5 e0 77 b0 81 <49> 89 2c 24 b 8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 44 [ 1571.770682] RIP [<ffffffff810c28dc>] __pv_queued_spin_lock_slowpath+0x18c/0x260 [ 1571.770685] RSP <ffff88006abb7d00> [ 1571.770686] CR2: 00000000000e90f0 [ 1571.770691] ---[ end trace 7ea4af2d99a92b5f ]--- [ 1571.770692] Kernel panic - not syncing: Fatal exception in interrupt [ 1571.770695] Kernel Offset: disabled (XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting. //---- kernel panic end ----// Any advice Hi xen-devel I am experiencing random reboots in one of my Xen 4.6 nodes on Debian. However, this only happens when dom0_mem is set to 2048M, below that i.e.: 1024M or above that i.e: 4096M the system is stable. See below a kernel panic log, I have attached also additional logs and information about this including some uprecords. The server had the PSU replaced and also hardware was tested for RAM, Disks or CPU issues.Hardware appears to be fine. Hardware: --------- Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz Supermicro Super Server/X10SRi-F, BIOS 1.0b Software RAID. More details on the lspci log attached. OS/System details: ------------------ Debian GNU/Linux stretch/sid (4.7.0-1-amd64 #1 SMP Debian 4.7.6-1 (2016-10-07) x86_64 GNU/Linux) Using Ganeti: gnt-cluster (ganeti 2.15.2-6) 2.15.2 Xen 4.6, installed using Debian packages. Xen and linux command line: ----------------------------- (Node 1) unstable host when dom_mem is set to 2048M (XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc (Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb 9 17:46:27 UTC 2016 (XEN) Bootloader: GRUB 2.02~beta2-36 (XEN) Command line: placeholder dom0_mem=2048M,max:2048M noreboot dom0_max_vcpus=1 com1=115200,8n1 console=com2 no-real-mode edd=off [*] Note that "loglvl=all guest_loglvl=all" has been added for verbose output on the attached log files. [*] no-real-mode and edd=off come from Debian, see also: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750958 but not a problem here. linux /proc/cmdline placeholder root=UUID=525998f8-b79b-42a5-824f-49862bccd14f ro nomodeset noreboot xencons=hvc console=hvc0 About the extra kernel args: dom0_max_vcpus=1 no-real-mode edd=off are having no effect on this issue, I have done some tests with and without those arguments. Furthermore, I have another node that has memory set to 2048 and it is not experiencing that problem, similar hardware and same OS pretty much. (Node 2) Here is the command line for Xen on this other host that seems stable so far: (XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc (Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb 9 17:46:27 UTC 2016 (XEN) Bootloader: GRUB 2.02~beta2-36 (XEN) Command line: placeholder dom0_mem=2048M,max:2048M no-real-mode edd=off linux /proc/cmdline placeholder root=UUID=a13f424f-11b2-4195-b82e-bda616ce6a6f ro nomodeset //---- kernel panic start ----// [ 1571.770465] BUG: unable to handle kernel paging request at 00000000000e90f0 [ 1571.770493] IP: [<ffffffff810c28dc>] __pv_queued_spin_lock_slowpath+0x18c/0x260 [ 1571.770501] PGD 6b389067 PUD 6b346067 PMD 0 [ 1571.770504] Oops: 0002 [#1] SMP [ 1571.770508] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback tun xen_blkback bridge stp llc xen_gntdev xen_evtchn xenfs xen_privcmd nls_ascii nls_cp437 vfat fat evdev iTCO_wdt iTCO_vendor_support intel_rapl sb_edac edac_core sg x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul t tm ghash_clmulni_intel drm_kms_helper drm lpc_ich mei_me i2c_i801 pcspkr mei mfd_core shpchp ioatdma drbd lru_cache sunrpc ip_tables x_tables autofs4 ext4 ecb crc16 jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear xen_blkfront dm_m od raid1 md_mod sd_mod crc32c_intel xhci_pci aesni_intel ahci libahci aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_hcd ehci_pci ehci_hcd libata usbcor e usb_common scsi_mod igb i2c_algo_bit dca ptp pps_core [ 1571.770552] CPU: 0 PID: 863 Comm: ganeti-mond Tainted: G W 4.7.0-1-amd64 #1 Debian 4.7.6-1 [ 1571.770554] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 [ 1571.770556] task: ffff88006a8c4000 ti: ffff88006abb4000 task.ti: ffff88006abb4000 [ 1571.770558] RIP: e030:[<ffffffff810c28dc>] [<ffffffff810c28dc>] __pv_queued_spin_lock_slowpath+0x18c/0x260 [ 1571.770561] RSP: e02b:ffff88006abb7d00 EFLAGS: 00010206 [ 1571.770563] RAX: 0000000000003ffe RBX: ffffc900403a910c RCX: 0000000000000002 [ 1571.770565] RDX: 0000000000000000 RSI: 00000000ffff8800 RDI: ffffc900403a910c [ 1571.770566] RBP: ffff8801068178c0 R08: ffff88007616f040 R09: ffffffff81547200 [ 1571.770568] R10: 000000001a99c9f3 R11: ffffc900403a2dfc R12: 00000000000e90f0 [ 1571.770570] R13: ffff880106817904 R14: 0000000000040000 R15: 0000000000000001 [ 1571.770613] FS: 00007f3f8d01cf00(0000) GS:ffff880106800000(0000) knlGS:0000000000000000 [ 1571.770615] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1571.770617] CR2: 00000000000e90f0 CR3: 000000006aaff000 CR4: 0000000000042660 [ 1571.770619] Stack: [ 1571.770620] ffffffff815461bc ffffc900403a9100 ffffc900403a910c 00000000e899ac00 [ 1571.770623] ffffc900403a2df0 ffff88007616f040 ffffffff81b11550 ffffffff815db00d [ 1571.770625] ffffffff815477d9 00007ffe0000000a ffffc900403a2dfc ffffffff81547200 [ 1571.770628] Call Trace: [ 1571.770633] [<ffffffff815461bc>] ? udp_lib_lport_inuse+0x2c/0xf0 [ 1571.770639] [<ffffffff815db00d>] ? _raw_spin_lock+0x1d/0x20 [ 1571.770641] [<ffffffff815477d9>] ? udp_lib_get_port+0x3d9/0x5a0 [ 1571.770644] [<ffffffff81547200>] ? udp4_seq_show+0x160/0x160 [ 1571.770647] [<ffffffff81552fa6>] ? inet_autobind+0x26/0x60 [ 1571.770649] [<ffffffff81554daa>] ? inet_sendmsg+0x7a/0xb0 [ 1571.770653] [<ffffffff814be8a0>] ? sock_sendmsg+0x30/0x40 [ 1571.770655] [<ffffffff814bee03>] ? SYSC_sendto+0xd3/0x150 [ 1571.770658] [<ffffffff8120cfa9>] ? SyS_select+0xc9/0x110 [ 1571.770661] [<ffffffff815db136>] ? system_call_fast_compare_end+0xc/0x96 [ 1571.770662] Code: 89 c4 c1 e8 12 4c 8d 6d 44 49 c1 ec 0c 83 e8 01 41 bf 01 00 00 00 41 83 e4 30 48 98 49 81 c4 c0 78 01 00 4c 03 24 c5 e0 77 b0 81 <49> 89 2c 24 b 8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 44 [ 1571.770682] RIP [<ffffffff810c28dc>] __pv_queued_spin_lock_slowpath+0x18c/0x260 [ 1571.770685] RSP <ffff88006abb7d00> [ 1571.770686] CR2: 00000000000e90f0 [ 1571.770691] ---[ end trace 7ea4af2d99a92b5f ]--- [ 1571.770692] Kernel panic - not syncing: Fatal exception in interrupt [ 1571.770695] Kernel Offset: disabled (XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting. //---- kernel panic end ----// Attachment:
dmesg_dom0_log.txt Attachment:
xl_info_log.txt Attachment:
dmesg_xen_log.txt Attachment:
lspci_log.txt Attachment:
info.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |