[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] [BUG?] Frequent Xen domain crashes with 4.1


  • To: xen-users@xxxxxxxxxxxxx
  • From: Wolodja Wentland <lists@xxxxxxxxxxxx>
  • Date: Wed, 24 Jun 2015 11:20:04 +0200
  • Delivery-date: Wed, 24 Jun 2015 09:21:27 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

Hello,

we are seeing frequent (once a week maybe) domain crashes in our infrastructure
and I would appreciate it a lot if we could be given some input on how to deal
with them (or even gather additional information).

It might very well turn out that upgrading to a newer Xen version is the only
way to address them, but if this problem is known and can be dealt with easily
then we would prefer to do that for the time being.

The domains are all running on Debian wheezy hosts with the following Xen
packages installed:

    libxen-4.1                         4.1.4-3+deb7u8
    libxenstore3.0                     4.1.4-3+deb7u8
    xen-hypervisor-4.1-amd64           4.1.4-3+deb7u8
    xen-linux-system-3.2.0-4-amd64     3.2.68-1+deb7u2
    xen-linux-system-amd64             3.2+46
    xen-system-amd64                   4.1.4-3+deb7u8
    xen-tools                          4.3.1-1
    xen-utils-4.1                      4.1.4-3+deb7u8
    xen-utils-common                   4.1.4-3+deb7u8
    xenstore-utils                     4.1.4-3+deb7u8

The hardware they are running on are standard Dell R720 and R730 boxes with
specs such as (let me know if you need more):

    R720xd

    cpu_model: Intel(R) Xeon(R) CPU E5-2670
    kernelrelease: 3.16.0-0.bpo.4-amd64

    R730

    cpu_model: Intel(R) Xeon(R) CPU E5-2697 v2
    kernelrelease: 3.16.0-0.bpo.4-amd64

The crashes we see are being reported in the xend log as:

    [$TIMESTAMP] WARNING (XendDomainInfo:2061) Domain has crashed: 
name=$DOMAIN_ID id=$ID

And we managed to get some output from the domains from 'xm console':

    --- snip ---
    [2180673.760082] INFO: rcu_bh detected stall on CPU 7 (t=0 jiffies)
    [2180673.760105] sending NMI to all CPUs:
    [2180673.760130] BUG: unable to handle kernel paging request at 
ffffffffff5fb310
    [2180673.760143] IP: [<ffffffff81027fd2>] native_apic_mem_write+0x2/0x9
    [2180673.760161] PGD 1607067 PUD 1608067 PMD 172e067 PTE 0
    [2180673.760175] Oops: 0002 1 SMP
    [2180673.760184] CPU 7
    [2180673.760188] Modules linked in: xt_multiport iptable_filter ip_tables
    x_tables nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc evdev coretemp
    snd_pcm snd_page_alloc crc32c_intel ghash_clmulni_intel snd_timer snd
    aesni_intel aes_x86_64 soundcore aes_generic pcspkr cryptd ext4 crc16 jbd2
    mbcache xen_blkfront xen_netfront
    [2180673.760260]
    [2180673.760269] Pid: 0, comm: swapper/7 Not tainted 3.2.0-4-amd64 #1 Debian
    3.2.68-1+deb7u1
    [2180673.760284] RIP: e030:[<ffffffff81027fd2>] [<ffffffff81027fd2>]
    native_apic_mem_write+0x2/0x9
    [2180673.760305] RSP: e02b:ffff8801ffdc3c90 EFLAGS: 00010086
    [2180673.760313] RAX: 0000000000000000 RBX: ffffffff816800e0 RCX:
    00000000000005e1
    [2180673.760323] RDX: 0000000000000000 RSI: 00000000ff000000 RDI:
    0000000000000310
    [2180673.760334] RBP: 0000000000000002 R08: 0000000000000000 R09:
    0000000000000000
    [2180673.760389] R10: 0000000000000000 R11: 7fffffffffffffff R12:
    0000000000000800
    [2180673.760403] R13: 00000000000000ff R14: ffff8801ffdcda78 R15:
    ffffffff8106c53c
    [2180673.760427] FS: 00007f57831a6700(0000) GS:ffff8801ffdc0000(0000)
    knlGS:0000000000000000
    [2180673.760444] CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
    [2180673.760453] CR2: ffffffffff5fb310 CR3: 00000001f3f9c000 CR4:
    0000000000002660
    [2180673.760462] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
    0000000000000000
    [2180673.760476] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
    0000000000000400
    [2180673.760489] Process swapper/7 (pid: 0, threadinfo ffff8801f6c98000, 
task
    ffff8801f6c70840)
    [2180673.760499] Stack:
    [2180673.760504] ffffffff8102820b 0000000000000000 0000000000002710
    ffffffff81620080
    [2180673.760529] ffffffff81620180 ffff8801ffdc3df0 ffffffff8102514e
    ffff8801ffdcdcb0
    [2180673.760549] ffffffff8109648c 0000000000000000 0043b339e8537589
    0000000000000000
    [2180673.760569] Call Trace:
    [2180673.760577] <IRQ>
    [2180673.760591] [<ffffffff8102820b>] ? _flat_send_IPI_mask+0x4b/0x78
    [2180673.760608] [<ffffffff8102514e>] ? 
arch_trigger_all_cpu_backtrace+0x4d/0x7b
    [2180673.760627] [<ffffffff8109648c>] ? __rcu_pending+0x82/0x358
    [2180673.760653] [<ffffffff8106c53c>] ? tick_nohz_handler+0xd0/0xd0
    [2180673.760677] [<ffffffff81096aae>] ? rcu_check_callbacks+0xaf/0xcc
    [2180673.760694] [<ffffffff81052dba>] ? update_process_times+0x31/0x63
    [2180673.760710] [<ffffffff8106c5a6>] ? tick_sched_timer+0x6a/0x90
    [2180673.760722] [<ffffffff81062736>] ? __run_hrtimer+0xac/0x135
    [2180673.760732] [<ffffffff81062e20>] ? hrtimer_interrupt+0xd7/0x1b1
    [2180673.760744] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc
    [2180673.760757] [<ffffffff81095543>] ? arch_local_irq_restore+0x7/0x8
    [2180673.760767] [<ffffffff81095edf>]
    ? check_for_new_grace_period.isra.25+0x98/0xa3
    [2180673.760779] [<ffffffff8109150d>] ? handle_irq_event_percpu+0x50/0x17d
    [2180673.760791] [<ffffffff8121d25e>] ? disable_pirq+0x2/0x2
    [2180673.760799] [<ffffffff8121ce48>] ? info_for_irq+0x7/0x17
    [2180673.760810] [<ffffffff81093b57>] ? handle_percpu_irq+0x3a/0x4f
    [2180673.760820] [<ffffffff8121d08a>] ? __xen_evtchn_do_upcall+0xd3/0x287
    [2180673.760831] [<ffffffff810069aa>] ? xen_clocksource_read+0x1d/0x1f
    [2180673.760842] [<ffffffff81064254>] ? 
sched_clock_idle_wakeup_event+0xf/0x17
    [2180673.760854] [<ffffffff8121e5bc>] ? xen_evtchn_do_upcall+0x22/0x32
    [2180673.760867] [<ffffffff813583fe>] ? xen_do_hypervisor_callback+0x1e/0x30
    [2180673.760875] <EOI>
    [2180673.760883] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2180673.760892] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2180673.760903] [<ffffffff8100675a>] ? xen_safe_halt+0xc/0x13
    [2180673.760915] [<ffffffff81014938>] ? default_idle+0x47/0x7f
    [2180673.760927] [<ffffffff8100d24c>] ? cpu_idle+0xaf/0xf2
    [2180673.760938] [<ffffffff81006cc9>] ? xen_irq_enable_direct_reloc+0x4/0x4
    [2180673.760946] Code: 00 74 18 48 8d 74 24 0c bf 1b 00 00 00 e8 ab fb ff 
ff f6
    c4 04 0f 95 c0 0f b6 c0 48 83 c4 10 c3 90 ff 14 25 d8 57 61 81 c3 89 ff 
<89> b7
    00 b0 5f ff c3 89 ff 8b 87 00 b0 5f ff c3 48 8b 07 25 ff
    [2180673.761038] RIP [<ffffffff81027fd2>] native_apic_mem_write+0x2/0x9
    [2180673.761050] RSP <ffff8801ffdc3c90>
    [2180673.761056] CR2: ffffffffff5fb310
    [2180673.761068] --[ end trace 1cfa73b4ca2dbc07 ]--
    [2180673.761077] Kernel panic - not syncing: Fatal exception in interrupt
    [2180673.761087] Pid: 0, comm: swapper/7 Tainted: G D 3.2.0-4-amd64 #1 
Debian
    3.2.68-1+deb7u1
    [2180673.761098] Call Trace:
    [2180673.761103] <IRQ> [<ffffffff8134a661>] ? panic+0x95/0x1a2
    [2180673.761120] [<ffffffff810713b7>] ? arch_local_irq_disable+0x7/0x8
    [2180673.761131] [<ffffffff81351107>] ? _raw_spin_unlock_irqrestore+0xe/0xf
    [2180673.761143] [<ffffffff81351fcc>] ? oops_end+0xa9/0xb6
    [2180673.761153] [<ffffffff81349f8a>] ? no_context+0x1ff/0x20e
    [2180673.761162] [<ffffffff81349818>] ? pmd_val+0x7/0x8
    [2180673.761171] [<ffffffff81349837>] ? pte_offset_kernel+0x16/0x35
    [2180673.761180] [<ffffffff81353fc9>] ? do_page_fault+0x1b6/0x345
    [2180673.761192] [<ffffffff8123abf6>] ? vt_console_print+0x280/0x296
    [2180673.761203] [<ffffffff8102bb5c>] ? pvclock_clocksource_read+0x42/0xb2
    [2180673.761213] [<ffffffff810713b7>] ? arch_local_irq_disable+0x7/0x8
    [2180673.761222] [<ffffffff810713c9>] ? arch_local_irq_save+0x11/0x17
    [2180673.761232] [<ffffffff813510c9>] ? _raw_spin_lock_irqsave+0x9/0x25
    [2180673.761242] [<ffffffff810639ef>] ? up+0xb/0x34
    [2180673.761250] [<ffffffff810713af>] ? arch_local_irq_restore+0x7/0x8
    [2180673.761260] [<ffffffff81351107>] ? _raw_spin_unlock_irqrestore+0xe/0xf
    [2180673.761272] [<ffffffff8104768b>] ? console_unlock+0x1f7/0x206
    [2180673.761283] [<ffffffff81351107>] ? _raw_spin_unlock_irqrestore+0xe/0xf
    [2180673.761294] [<ffffffff8106c53c>] ? tick_nohz_handler+0xd0/0xd0
    [2180673.761304] [<ffffffff813516d5>] ? page_fault+0x25/0x30
    [2180673.761313] [<ffffffff8106c53c>] ? tick_nohz_handler+0xd0/0xd0
    [2180673.761324] [<ffffffff81027fd2>] ? native_apic_mem_write+0x2/0x9
    [2180673.761336] [<ffffffff8102820b>] ? _flat_send_IPI_mask+0x4b/0x78
    [2180673.761347] [<ffffffff8102514e>] ? 
arch_trigger_all_cpu_backtrace+0x4d/0x7b
    [2180673.761358] [<ffffffff8109648c>] ? __rcu_pending+0x82/0x358
    [2180673.761410] [<ffffffff8106c53c>] ? tick_nohz_handler+0xd0/0xd0
    [2180673.761427] [<ffffffff81096aae>] ? rcu_check_callbacks+0xaf/0xcc
    [2180673.761443] [<ffffffff81052dba>] ? update_process_times+0x31/0x63
    [2180673.761458] [<ffffffff8106c5a6>] ? tick_sched_timer+0x6a/0x90
    [2180673.761470] [<ffffffff81062736>] ? __run_hrtimer+0xac/0x135
    [2180673.761481] [<ffffffff81062e20>] ? hrtimer_interrupt+0xd7/0x1b1
    [2180673.761492] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc
    [2180673.761503] [<ffffffff81095543>] ? arch_local_irq_restore+0x7/0x8
    [2180673.761515] [<ffffffff81095edf>]
    ? check_for_new_grace_period.isra.25+0x98/0xa3
    [2180673.761527] [<ffffffff8109150d>] ? handle_irq_event_percpu+0x50/0x17d
    [2180673.761537] [<ffffffff8121d25e>] ? disable_pirq+0x2/0x2
    [2180673.761547] [<ffffffff8121ce48>] ? info_for_irq+0x7/0x17
    [2180673.761558] [<ffffffff81093b57>] ? handle_percpu_irq+0x3a/0x4f
    [2180673.761567] [<ffffffff8121d08a>] ? __xen_evtchn_do_upcall+0xd3/0x287
    [2180673.761578] [<ffffffff810069aa>] ? xen_clocksource_read+0x1d/0x1f
    [2180673.761589] [<ffffffff81064254>] ? 
sched_clock_idle_wakeup_event+0xf/0x17
    [2180673.761603] [<ffffffff8121e5bc>] ? xen_evtchn_do_upcall+0x22/0x32
    [2180673.761614] [<ffffffff813583fe>] ? xen_do_hypervisor_callback+0x1e/0x30
    [2180673.761621] <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2180673.761634] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2180673.761645] [<ffffffff8100675a>] ? xen_safe_halt+0xc/0x13
    [2180673.761655] [<ffffffff81014938>] ? default_idle+0x47/0x7f
    [2180673.761666] [<ffffffff8100d24c>] ? cpu_idle+0xaf/0xf2
    [2180673.761675] [<ffffffff81006cc9>] ? xen_irq_enable_direct_reloc+0x4/0x4
    --- snip ---

or


    --- snip ---
    [2281134.304094] INFO: rcu_bh detected stall on CPU 5 (t=0 jiffies)
    [2281134.304116] sending NMI to all CPUs:
    [2281134.304141] BUG: unable to handle kernel paging request at
    ffffffffff5fb310
    [2281134.304153] IP: [<ffffffff81027fb2>] native_apic_mem_write+0x2/0x9
    [2281134.304172] PGD 1607067 PUD 1608067 PMD 172d067 PTE 0
    [2281134.304185] Oops: 0002 1 SMP 
    [2281134.304195] CPU 5 
    [2281134.304199] Modules linked in: xt_tcpudp xt_multiport iptable_filter
    ip_tables x_tables coretemp evdev crc32c_intel ghash_clmulni_intel snd_pcm
    snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic cryptd
    soundcore pcspkr ext4 crc16 jbd2 mbcache xen_netfront xen_blkfront
    [2281134.304262] 
    [2281134.304272] Pid: 0, comm: swapper/5 Not tainted 3.2.0-4-amd64 #1 Debian
    3.2.65-1+deb7u2 
    [2281134.304291] RIP: e030:[<ffffffff81027fb2>] [<ffffffff81027fb2>]
    native_apic_mem_write+0x2/0x9
    [2281134.304314] RSP: e02b:ffff8801ffd43c90 EFLAGS: 00010086
    [2281134.304326] RAX: 0000000000000000 RBX: ffffffff81680060 RCX:
    000000000000022f
    [2281134.304340] RDX: 0000000000000000 RSI: 00000000ff000000 RDI:
    0000000000000310
    [2281134.304354] RBP: 0000000000000002 R08: 0000000000000000 R09:
    0000000000000000
    [2281134.304369] R10: 0000000000000000 R11: 7fffffffffffffff R12:
    0000000000000800
    [2281134.304382] R13: 00000000000000ff R14: ffff8801ffd4da78 R15:
    ffffffff8106c4b4
    [2281134.304410] FS: 00007f08ed8b8740(0000) GS:ffff8801ffd40000(0000)
    knlGS:0000000000000000
    [2281134.304426] CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
    [2281134.304438] CR2: ffffffffff5fb310 CR3: 00000001583e2000 CR4:
    0000000000002660
    [2281134.304454] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
    0000000000000000
    [2281134.304469] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
    0000000000000400
    [2281134.304486] Process swapper/5 (pid: 0, threadinfo ffff8801f6c90000,
    task ffff8801f6c6f800)
    [2281134.304505] Stack:
    [2281134.304512] ffffffff810281eb 0000000000000000 0000000000002710
    ffffffff81620080
    [2281134.304537] ffffffff81620180 ffff8801ffd43df0 ffffffff8102512e
    ffff8801ffd4dcb0
    [2281134.304562] ffffffff810963b8 ffffffff810069aa 0114b8bb8aa723c9
    ffffffff81013e64
    [2281134.304629] Call Trace:
    [2281134.304638] <IRQ> 
    [2281134.304654] [<ffffffff810281eb>] ? _flat_send_IPI_mask+0x4b/0x78
    [2281134.304690] [<ffffffff8102512e>]
    ? arch_trigger_all_cpu_backtrace+0x4d/0x7b
    [2281134.304711] [<ffffffff810963b8>] ? __rcu_pending+0x82/0x358
    [2281134.304725] [<ffffffff810069aa>] ? xen_clocksource_read+0x1d/0x1f
    [2281134.304737] [<ffffffff81013e64>] ? sched_clock+0x5/0x8
    [2281134.304754] [<ffffffff8106c4b4>] ? tick_nohz_handler+0xd0/0xd0
    [2281134.304765] [<ffffffff810969da>] ? rcu_check_callbacks+0xaf/0xcc
    [2281134.304778] [<ffffffff81052d66>] ? update_process_times+0x31/0x63
    [2281134.304790] [<ffffffff8106c51e>] ? tick_sched_timer+0x6a/0x90
    [2281134.304801] [<ffffffff810626e2>] ? __run_hrtimer+0xac/0x135
    [2281134.304812] [<ffffffff81062dcc>] ? hrtimer_interrupt+0xd7/0x1b1
    [2281134.304823] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc
    [2281134.304837] [<ffffffff81244e09>] ? get_cycles+0x5/0x8
    [2281134.304847] [<ffffffff81245cbb>] ? add_interrupt_randomness+0x38/0x155
    [2281134.304858] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2281134.304870] [<ffffffff81091445>] ? handle_irq_event_percpu+0x50/0x17d
    [2281134.304881] [<ffffffff8121d04a>] ? disable_pirq+0x2/0x2
    [2281134.304894] [<ffffffff8121cc34>] ? info_for_irq+0x7/0x17
    [2281134.304909] [<ffffffff81093a8f>] ? handle_percpu_irq+0x3a/0x4f
    [2281134.304925] [<ffffffff8121ce76>] ? __xen_evtchn_do_upcall+0xd3/0x287
    [2281134.304941] [<ffffffff810069aa>] ? xen_clocksource_read+0x1d/0x1f
    [2281134.304958] [<ffffffff81064200>]
    ? sched_clock_idle_wakeup_event+0xf/0x17
    [2281134.304975] [<ffffffff8121e3a8>] ? xen_evtchn_do_upcall+0x22/0x32
    [2281134.304996] [<ffffffff8135823e>] ? xen_do_hypervisor_callback+0x1e/0x30
    [2281134.305009] <EOI> 
    [2281134.305022] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2281134.305036] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2281134.305053] [<ffffffff8100675a>] ? xen_safe_halt+0xc/0x13
    [2281134.305069] [<ffffffff81014938>] ? default_idle+0x47/0x7f
    [2281134.305086] [<ffffffff8100d24c>] ? cpu_idle+0xaf/0xf2
    [2281134.305099] [<ffffffff81006cc9>] ? xen_irq_enable_direct_reloc+0x4/0x4
    [2281134.305112] Code: 00 74 18 48 8d 74 24 0c bf 1b 00 00 00 e8 ab fb ff ff
    f6 c4 04 0f 95 c0 0f b6 c0 48 83 c4 10 c3 90 ff 14 25 d8 57 61 81 c3 89 ff
    <89> b7 00 b0 5f ff c3 89 ff 8b 87 00 b0 5f ff c3 48 8b 07 25 ff 
    [2281134.305213] RIP [<ffffffff81027fb2>] native_apic_mem_write+0x2/0x9
    [2281134.305226] RSP <ffff8801ffd43c90>
    [2281134.305231] CR2: ffffffffff5fb310
    [2281134.305244] --[ end trace a9674388af60c44e ]--
    [2281134.305252] Kernel panic - not syncing: Fatal exception in interrupt
    [2281134.305263] Pid: 0, comm: swapper/5 Tainted: G D 3.2.0-4-amd64 #1
    Debian 3.2.65-1+deb7u2
    [2281134.305273] Call Trace:
    [2281134.305278] <IRQ> [<ffffffff8134a53c>] ? panic+0x95/0x1a2
    [2281134.305295] [<ffffffff8107132f>] ? arch_local_irq_disable+0x7/0x8
    [2281134.305307] [<ffffffff81350f2f>] ? _raw_spin_unlock_irqrestore+0xe/0xf
    [2281134.305318] [<ffffffff81351dcc>] ? oops_end+0xa9/0xb6
    [2281134.305329] [<ffffffff81349e7b>] ? no_context+0x1ff/0x20e
    [2281134.305339] [<ffffffff81349709>] ? pmd_val+0x7/0x8
    [2281134.305348] [<ffffffff81349728>] ? pte_offset_kernel+0x16/0x35
    [2281134.305358] [<ffffffff81353dca>] ? do_page_fault+0x1b6/0x345
    [2281134.305370] [<ffffffff8123a9ce>] ? vt_console_print+0x280/0x296
    [2281134.305384] [<ffffffff8102bb5c>] ? pvclock_clocksource_read+0x42/0xb2
    [2281134.305393] [<ffffffff8107132f>] ? arch_local_irq_disable+0x7/0x8
    [2281134.305404] [<ffffffff81071341>] ? arch_local_irq_save+0x11/0x17
    [2281134.305416] [<ffffffff81350ef1>] ? _raw_spin_lock_irqsave+0x9/0x25
    [2281134.305427] [<ffffffff8106399b>] ? up+0xb/0x34
    [2281134.305436] [<ffffffff81071327>] ? arch_local_irq_restore+0x7/0x8
    [2281134.305447] [<ffffffff81350f2f>] ? _raw_spin_unlock_irqrestore+0xe/0xf
    [2281134.305460] [<ffffffff8104764f>] ? console_unlock+0x1f7/0x206
    [2281134.305470] [<ffffffff81350f2f>] ? _raw_spin_unlock_irqrestore+0xe/0xf
    [2281134.305482] [<ffffffff8106c4b4>] ? tick_nohz_handler+0xd0/0xd0
    [2281134.305494] [<ffffffff813514d5>] ? page_fault+0x25/0x30
    [2281134.305503] [<ffffffff8106c4b4>] ? tick_nohz_handler+0xd0/0xd0
    [2281134.305514] [<ffffffff81027fb2>] ? native_apic_mem_write+0x2/0x9
    [2281134.305525] [<ffffffff810281eb>] ? _flat_send_IPI_mask+0x4b/0x78
    [2281134.305537] [<ffffffff8102512e>]
    ? arch_trigger_all_cpu_backtrace+0x4d/0x7b
    [2281134.305547] [<ffffffff810963b8>] ? __rcu_pending+0x82/0x358
    [2281134.305557] [<ffffffff810069aa>] ? xen_clocksource_read+0x1d/0x1f
    [2281134.305567] [<ffffffff81013e64>] ? sched_clock+0x5/0x8
    [2281134.305577] [<ffffffff8106c4b4>] ? tick_nohz_handler+0xd0/0xd0
    [2281134.305586] [<ffffffff810969da>] ? rcu_check_callbacks+0xaf/0xcc
    [2281134.305596] [<ffffffff81052d66>] ? update_process_times+0x31/0x63
    [2281134.305606] [<ffffffff8106c51e>] ? tick_sched_timer+0x6a/0x90
    [2281134.305618] [<ffffffff810626e2>] ? __run_hrtimer+0xac/0x135
    [2281134.305629] [<ffffffff81062dcc>] ? hrtimer_interrupt+0xd7/0x1b1
    [2281134.305641] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc
    [2281134.305656] [<ffffffff81244e09>] ? get_cycles+0x5/0x8
    [2281134.305669] [<ffffffff81245cbb>] ? add_interrupt_randomness+0x38/0x155
    [2281134.305682] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2281134.305697] [<ffffffff81091445>] ? handle_irq_event_percpu+0x50/0x17d
    [2281134.305712] [<ffffffff8121d04a>] ? disable_pirq+0x2/0x2
    [2281134.305726] [<ffffffff8121cc34>] ? info_for_irq+0x7/0x17
    [2281134.305741] [<ffffffff81093a8f>] ? handle_percpu_irq+0x3a/0x4f
    [2281134.305756] [<ffffffff8121ce76>] ? __xen_evtchn_do_upcall+0xd3/0x287
    [2281134.305772] [<ffffffff810069aa>] ? xen_clocksource_read+0x1d/0x1f
    [2281134.305788] [<ffffffff81064200>]
    ? sched_clock_idle_wakeup_event+0xf/0x17
    [2281134.305805] [<ffffffff8121e3a8>] ? xen_evtchn_do_upcall+0x22/0x32
    [2281134.305820] [<ffffffff8135823e>] ? xen_do_hypervisor_callback+0x1e/0x30
    [2281134.305830] <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2281134.305843] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
    [2281134.305855] [<ffffffff8100675a>] ? xen_safe_halt+0xc/0x13
    [2281134.311297] [<ffffffff81014938>] ? default_idle+0x47/0x7f
    [2281134.311297] [<ffffffff8100d24c>] ? cpu_idle+0xaf/0xf2
    [2281134.311297] [<ffffffff81006cc9>] ? xen_irq_enable_direct_reloc+0x4/0x4
    --- snip ---

Most crashes look like the above, but in another instance we saw the following
which might very well be a different bug:

    --- snip ---
    [827708.124149] [sched_delayed] sched: RT throttling activated
    [1225904.780170] general protection fault: 0000 [#1] SMP 
    [1225904.780193] Modules linked in: veth bridge stp llc xt_multiport 
aufs(C) ip6table_filter ip6_tables xt_nat ipt_MASQUERADE xt_addrtype 
iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 
xt_conntrack iptable_filter ip_tables x_tables nf_conntrack 
x86_pkg_temp_thermal thermal_sys coretemp crct10dif_pclmul crct10dif_common 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul evdev 
glue_helper ablk_helper pcspkr cryptd ext4 crc16 mbcache jbd2 xen_blkfront 
xen_netfront crc32c_intel
    [1225904.780273] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G         C    
3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt4-3~bpo70+1
    [1225904.780285] task: ffff8800fa3c13b0 ti: ffff8800fa3c4000 task.ti: 
ffff8800fa3c4000
    [1225904.780293] RIP: e030:[<ffffffff8107bbb8>]  [<ffffffff8107bbb8>] 
get_next_timer_interrupt+0x158/0x250
    [1225904.780314] RSP: e02b:ffff8800fa3c7df0  EFLAGS: 00010092
    [1225904.780320] RAX: 6c6261736944202e RBX: 0000000112435014 RCX: 
ffff8800f9cad130
    [1225904.780327] RDX: 00000001124350f4 RSI: 0000000000000001 RDI: 
0000000000000010
    [1225904.780335] RBP: 0000000112434fac R08: 0000000000000010 R09: 
ffff8800f9cad030
    [1225904.780342] R10: 0000000000000000 R11: 0000000001124350 R12: 
0000000152434fab
    [1225904.780350] R13: ffff8800f9cac000 R14: 0000000000000040 R15: 
0000000112434fac
    [1225904.780365] FS:  00007f910476d740(0000) GS:ffff8800ff2c0000(0000) 
knlGS:0000000000000000
    [1225904.780373] CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
    [1225904.780379] CR2: 00007f7503391fd0 CR3: 00000000f847d000 CR4: 
0000000000002660
    [1225904.780387] Stack:
    [1225904.780391]  ffffffff8100b430 0000000000000000 ffff8800f9cad030 
ffff8800f9cad430
    [1225904.780402]  ffff8800f9cad830 ffff8800f9cadc30 ffff8800ff2cda80 
00045af432f2e0c1
    [1225904.780413]  00045af432f0bb00 ffff8800ff2cae40 0000000000000000 
ffffffff810db76d
    [1225904.780423] Call Trace:
    [1225904.780435]  [<ffffffff8100b430>] ? xen_clocksource_read+0x20/0x30
    [1225904.780449]  [<ffffffff810db76d>] ? __tick_nohz_idle_enter+0x26d/0x4a0
    [1225904.780459]  [<ffffffff810dbc1d>] ? tick_nohz_idle_enter+0x3d/0x70
    [1225904.780469]  [<ffffffff810b15c2>] ? cpu_startup_entry+0x92/0x4b0
    [1225904.780479]  [<ffffffff8100b139>] ? xen_force_evtchn_callback+0x9/0x10
    [1225904.780486]  [<ffffffff8100ba12>] ? check_events+0x12/0x20
    [1225904.780493] Code: 89 d8 41 83 e0 3f 44 89 c7 66 2e 0f 1f 84 00 00 00 
00 00 48 63 cf 48 c1 e1 04 4c 01 c9 48 8b 01 48 39 c8 74 24 66 0f 1f 44 00 00 
<f6> 40 18 01 75 10 48 8b 50 10 be 01 00 00 00 48 39 da 48 0f 48 
    [1225904.780563] RIP  [<ffffffff8107bbb8>] 
get_next_timer_interrupt+0x158/0x250
    [1225904.780573]  RSP <ffff8800fa3c7df0>
    [1225904.780583] ---[ end trace 2fcf00f1694e8f18 ]---
    [1225904.780591] Kernel panic - not syncing: Attempted to kill the idle 
task!
    [1225904.780616] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffff9fffffff)
    --- snip ---

What information would be required for a proper bug report or how can we deal
with these instances of "sudden cloud death syndrome".
-- 
Wolodja Wentland <lists@xxxxxxxxxxxx>

4096R/CAF14EFC
081C B7CD FF04 2BA9 94EA  36B2 8B7F 7D30 CAF1 4EFC

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.