[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [BUG] kernel panic when dom_mem is set to 2048M



Hi xen-devel

I am experiencing an odd issue with Xen crashing (Kernel Panic) in one
of my Xen 4.6 nodes on Debian. However, this only happens when
dom0_mem is set to 2048M, below that i.e.: 1024M or above that i.e:
4096M the system is stable.

See below a kernel panic log, I have attached also additional logs and
information about this including some uprecords. The server had the
PSU replaced and also hardware was tested for RAM, Disks or CPU
issues.Hardware appears to be fine.

Hardware:
---------
Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz
Supermicro Super Server/X10SRi-F, BIOS 1.0b
Software RAID.
More details on the lspci log attached.

OS/System details:
------------------
Debian GNU/Linux stretch/sid (4.7.0-1-amd64 #1 SMP Debian 4.7.6-1
(2016-10-07) x86_64 GNU/Linux)
Using Ganeti: gnt-cluster (ganeti 2.15.2-6) 2.15.2
Xen 4.6, installed using Debian packages.

Xen and linux command line:
-----------------------------

(Node 1) unstable host when dom_mem is set to 2048M

(XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc
(Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb  9 17:46:27 UTC 2016
(XEN) Bootloader: GRUB 2.02~beta2-36
(XEN) Command line: placeholder dom0_mem=2048M,max:2048M noreboot
dom0_max_vcpus=1 com1=115200,8n1 console=com2 no-real-mode edd=off

[*] Note that "loglvl=all guest_loglvl=all" has been added for verbose
output on the attached log files.
[*] no-real-mode and edd=off come from Debian, see also:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750958 but not a
problem here.

linux /proc/cmdline
placeholder root=UUID=525998f8-b79b-42a5-824f-49862bccd14f ro
nomodeset noreboot xencons=hvc console=hvc0

About the extra kernel args:

dom0_max_vcpus=1 no-real-mode edd=off are having no effect on this
issue, I have done some tests with and without those arguments.
Furthermore, I have another node that has memory set to 2048 and it is
not experiencing that problem, similar hardware and same OS pretty
much.

(Node 2) Here is the command line for Xen on this other host that
seems stable so far:

(XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc
(Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb  9 17:46:27 UTC 2016
(XEN) Bootloader: GRUB 2.02~beta2-36
(XEN) Command line: placeholder dom0_mem=2048M,max:2048M no-real-mode edd=off

linux /proc/cmdline
placeholder root=UUID=a13f424f-11b2-4195-b82e-bda616ce6a6f ro nomodeset

//---- kernel panic start ----//

[ 1571.770465] BUG: unable to handle kernel paging request at
00000000000e90f0
[ 1571.770493] IP: [<ffffffff810c28dc>]
__pv_queued_spin_lock_slowpath+0x18c/0x260
[ 1571.770501] PGD 6b389067 PUD 6b346067 PMD 0
[ 1571.770504] Oops: 0002 [#1] SMP
[ 1571.770508] Modules linked in: xt_physdev br_netfilter
iptable_filter xen_netback tun xen_blkback bridge stp llc xen_gntdev
xen_evtchn xenfs xen_privcmd nls_ascii
 nls_cp437 vfat fat evdev iTCO_wdt iTCO_vendor_support intel_rapl
sb_edac edac_core sg x86_pkg_temp_thermal intel_powerclamp coretemp
crct10dif_pclmul crc32_pclmul t
tm ghash_clmulni_intel drm_kms_helper drm lpc_ich mei_me i2c_i801
pcspkr mei mfd_core shpchp ioatdma drbd lru_cache sunrpc ip_tables
x_tables autofs4 ext4 ecb crc16
jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0
multipath linear xen_blkfront dm_m
od raid1 md_mod sd_mod crc32c_intel xhci_pci aesni_intel ahci libahci
aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_hcd
ehci_pci ehci_hcd libata usbcor
e usb_common scsi_mod igb i2c_algo_bit dca ptp pps_core
[ 1571.770552] CPU: 0 PID: 863 Comm: ganeti-mond Tainted: G        W
    4.7.0-1-amd64 #1 Debian 4.7.6-1
[ 1571.770554] Hardware name: Supermicro Super Server/X10SRi-F, BIOS
1.0b 04/21/2015
[ 1571.770556] task: ffff88006a8c4000 ti: ffff88006abb4000 task.ti:
ffff88006abb4000
[ 1571.770558] RIP: e030:[<ffffffff810c28dc>]  [<ffffffff810c28dc>]
__pv_queued_spin_lock_slowpath+0x18c/0x260
[ 1571.770561] RSP: e02b:ffff88006abb7d00  EFLAGS: 00010206
[ 1571.770563] RAX: 0000000000003ffe RBX: ffffc900403a910c RCX:
0000000000000002
[ 1571.770565] RDX: 0000000000000000 RSI: 00000000ffff8800 RDI:
ffffc900403a910c
[ 1571.770566] RBP: ffff8801068178c0 R08: ffff88007616f040 R09:
ffffffff81547200
[ 1571.770568] R10: 000000001a99c9f3 R11: ffffc900403a2dfc R12:
00000000000e90f0
[ 1571.770570] R13: ffff880106817904 R14: 0000000000040000 R15:
0000000000000001
[ 1571.770613] FS:  00007f3f8d01cf00(0000) GS:ffff880106800000(0000)
knlGS:0000000000000000
[ 1571.770615] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1571.770617] CR2: 00000000000e90f0 CR3: 000000006aaff000 CR4:
0000000000042660
[ 1571.770619] Stack:
[ 1571.770620]  ffffffff815461bc ffffc900403a9100 ffffc900403a910c
00000000e899ac00
[ 1571.770623]  ffffc900403a2df0 ffff88007616f040 ffffffff81b11550
ffffffff815db00d
[ 1571.770625]  ffffffff815477d9 00007ffe0000000a ffffc900403a2dfc
ffffffff81547200
[ 1571.770628] Call Trace:
[ 1571.770633]  [<ffffffff815461bc>] ? udp_lib_lport_inuse+0x2c/0xf0
[ 1571.770639]  [<ffffffff815db00d>] ? _raw_spin_lock+0x1d/0x20
[ 1571.770641]  [<ffffffff815477d9>] ? udp_lib_get_port+0x3d9/0x5a0
[ 1571.770644]  [<ffffffff81547200>] ? udp4_seq_show+0x160/0x160
[ 1571.770647]  [<ffffffff81552fa6>] ? inet_autobind+0x26/0x60
[ 1571.770649]  [<ffffffff81554daa>] ? inet_sendmsg+0x7a/0xb0
[ 1571.770653]  [<ffffffff814be8a0>] ? sock_sendmsg+0x30/0x40
[ 1571.770655]  [<ffffffff814bee03>] ? SYSC_sendto+0xd3/0x150
[ 1571.770658]  [<ffffffff8120cfa9>] ? SyS_select+0xc9/0x110
[ 1571.770661]  [<ffffffff815db136>] ?
system_call_fast_compare_end+0xc/0x96
[ 1571.770662] Code: 89 c4 c1 e8 12 4c 8d 6d 44 49 c1 ec 0c 83 e8 01
41 bf 01 00 00 00 41 83 e4 30 48 98 49 81 c4 c0 78 01 00 4c 03 24 c5
e0 77 b0 81 <49> 89 2c 24 b
8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 44
[ 1571.770682] RIP  [<ffffffff810c28dc>]
__pv_queued_spin_lock_slowpath+0x18c/0x260
[ 1571.770685]  RSP <ffff88006abb7d00>
[ 1571.770686] CR2: 00000000000e90f0
[ 1571.770691] ---[ end trace 7ea4af2d99a92b5f ]---
[ 1571.770692] Kernel panic - not syncing: Fatal exception in
interrupt
[ 1571.770695] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting.

//---- kernel panic end ----//

Any advice Hi xen-devel

I am experiencing random reboots in one of my Xen 4.6 nodes on Debian.
However, this only happens when dom0_mem is set to 2048M, below that
i.e.: 1024M or above that i.e: 4096M the system is stable.

See below a kernel panic log, I have attached also additional logs and
information about this including some uprecords. The server had the
PSU replaced and also hardware was tested for RAM, Disks or CPU
issues.Hardware appears to be fine.

Hardware:
---------
Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz
Supermicro Super Server/X10SRi-F, BIOS 1.0b
Software RAID.
More details on the lspci log attached.

OS/System details:
------------------
Debian GNU/Linux stretch/sid (4.7.0-1-amd64 #1 SMP Debian 4.7.6-1
(2016-10-07) x86_64 GNU/Linux)
Using Ganeti: gnt-cluster (ganeti 2.15.2-6) 2.15.2
Xen 4.6, installed using Debian packages.

Xen and linux command line:
-----------------------------

(Node 1) unstable host when dom_mem is set to 2048M

(XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc
(Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb  9 17:46:27 UTC 2016
(XEN) Bootloader: GRUB 2.02~beta2-36
(XEN) Command line: placeholder dom0_mem=2048M,max:2048M noreboot
dom0_max_vcpus=1 com1=115200,8n1 console=com2 no-real-mode edd=off

[*] Note that "loglvl=all guest_loglvl=all" has been added for verbose
output on the attached log files.
[*] no-real-mode and edd=off come from Debian, see also:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750958 but not a
problem here.

linux /proc/cmdline
placeholder root=UUID=525998f8-b79b-42a5-824f-49862bccd14f ro
nomodeset noreboot xencons=hvc console=hvc0

About the extra kernel args:

dom0_max_vcpus=1 no-real-mode edd=off are having no effect on this
issue, I have done some tests with and without those arguments.
Furthermore, I have another node that has memory set to 2048 and it is
not experiencing that problem, similar hardware and same OS pretty
much.

(Node 2) Here is the command line for Xen on this other host that
seems stable so far:

(XEN) Xen version 4.6.0 (Debian 4.6.0-1+nmu2) (ijc@xxxxxxxxxx) (gcc
(Debian 5.3.1-8) 5.3.1 20160205) debug=n Tue Feb  9 17:46:27 UTC 2016
(XEN) Bootloader: GRUB 2.02~beta2-36
(XEN) Command line: placeholder dom0_mem=2048M,max:2048M no-real-mode edd=off

linux /proc/cmdline
placeholder root=UUID=a13f424f-11b2-4195-b82e-bda616ce6a6f ro nomodeset

//---- kernel panic start ----//

[ 1571.770465] BUG: unable to handle kernel paging request at
00000000000e90f0
[ 1571.770493] IP: [<ffffffff810c28dc>]
__pv_queued_spin_lock_slowpath+0x18c/0x260
[ 1571.770501] PGD 6b389067 PUD 6b346067 PMD 0
[ 1571.770504] Oops: 0002 [#1] SMP
[ 1571.770508] Modules linked in: xt_physdev br_netfilter
iptable_filter xen_netback tun xen_blkback bridge stp llc xen_gntdev
xen_evtchn xenfs xen_privcmd nls_ascii
 nls_cp437 vfat fat evdev iTCO_wdt iTCO_vendor_support intel_rapl
sb_edac edac_core sg x86_pkg_temp_thermal intel_powerclamp coretemp
crct10dif_pclmul crc32_pclmul t
tm ghash_clmulni_intel drm_kms_helper drm lpc_ich mei_me i2c_i801
pcspkr mei mfd_core shpchp ioatdma drbd lru_cache sunrpc ip_tables
x_tables autofs4 ext4 ecb crc16
jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0
multipath linear xen_blkfront dm_m
od raid1 md_mod sd_mod crc32c_intel xhci_pci aesni_intel ahci libahci
aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_hcd
ehci_pci ehci_hcd libata usbcor
e usb_common scsi_mod igb i2c_algo_bit dca ptp pps_core
[ 1571.770552] CPU: 0 PID: 863 Comm: ganeti-mond Tainted: G        W
    4.7.0-1-amd64 #1 Debian 4.7.6-1
[ 1571.770554] Hardware name: Supermicro Super Server/X10SRi-F, BIOS
1.0b 04/21/2015
[ 1571.770556] task: ffff88006a8c4000 ti: ffff88006abb4000 task.ti:
ffff88006abb4000
[ 1571.770558] RIP: e030:[<ffffffff810c28dc>]  [<ffffffff810c28dc>]
__pv_queued_spin_lock_slowpath+0x18c/0x260
[ 1571.770561] RSP: e02b:ffff88006abb7d00  EFLAGS: 00010206
[ 1571.770563] RAX: 0000000000003ffe RBX: ffffc900403a910c RCX:
0000000000000002
[ 1571.770565] RDX: 0000000000000000 RSI: 00000000ffff8800 RDI:
ffffc900403a910c
[ 1571.770566] RBP: ffff8801068178c0 R08: ffff88007616f040 R09:
ffffffff81547200
[ 1571.770568] R10: 000000001a99c9f3 R11: ffffc900403a2dfc R12:
00000000000e90f0
[ 1571.770570] R13: ffff880106817904 R14: 0000000000040000 R15:
0000000000000001
[ 1571.770613] FS:  00007f3f8d01cf00(0000) GS:ffff880106800000(0000)
knlGS:0000000000000000
[ 1571.770615] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1571.770617] CR2: 00000000000e90f0 CR3: 000000006aaff000 CR4:
0000000000042660
[ 1571.770619] Stack:
[ 1571.770620]  ffffffff815461bc ffffc900403a9100 ffffc900403a910c
00000000e899ac00
[ 1571.770623]  ffffc900403a2df0 ffff88007616f040 ffffffff81b11550
ffffffff815db00d
[ 1571.770625]  ffffffff815477d9 00007ffe0000000a ffffc900403a2dfc
ffffffff81547200
[ 1571.770628] Call Trace:
[ 1571.770633]  [<ffffffff815461bc>] ? udp_lib_lport_inuse+0x2c/0xf0
[ 1571.770639]  [<ffffffff815db00d>] ? _raw_spin_lock+0x1d/0x20
[ 1571.770641]  [<ffffffff815477d9>] ? udp_lib_get_port+0x3d9/0x5a0
[ 1571.770644]  [<ffffffff81547200>] ? udp4_seq_show+0x160/0x160
[ 1571.770647]  [<ffffffff81552fa6>] ? inet_autobind+0x26/0x60
[ 1571.770649]  [<ffffffff81554daa>] ? inet_sendmsg+0x7a/0xb0
[ 1571.770653]  [<ffffffff814be8a0>] ? sock_sendmsg+0x30/0x40
[ 1571.770655]  [<ffffffff814bee03>] ? SYSC_sendto+0xd3/0x150
[ 1571.770658]  [<ffffffff8120cfa9>] ? SyS_select+0xc9/0x110
[ 1571.770661]  [<ffffffff815db136>] ?
system_call_fast_compare_end+0xc/0x96
[ 1571.770662] Code: 89 c4 c1 e8 12 4c 8d 6d 44 49 c1 ec 0c 83 e8 01
41 bf 01 00 00 00 41 83 e4 30 48 98 49 81 c4 c0 78 01 00 4c 03 24 c5
e0 77 b0 81 <49> 89 2c 24 b
8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 44
[ 1571.770682] RIP  [<ffffffff810c28dc>]
__pv_queued_spin_lock_slowpath+0x18c/0x260
[ 1571.770685]  RSP <ffff88006abb7d00>
[ 1571.770686] CR2: 00000000000e90f0
[ 1571.770691] ---[ end trace 7ea4af2d99a92b5f ]---
[ 1571.770692] Kernel panic - not syncing: Fatal exception in
interrupt
[ 1571.770695] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting.

//---- kernel panic end ----//

Attachment: dmesg_dom0_log.txt
Description: Text document

Attachment: xl_info_log.txt
Description: Text document

Attachment: dmesg_xen_log.txt
Description: Text document

Attachment: lspci_log.txt
Description: Text document

Attachment: info.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.