[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 4.10.1 Xen crash and reboot


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andy Smith <andy@xxxxxxxxxxxxxx>
  • Date: Tue, 1 Jan 2019 19:46:57 +0000
  • Delivery-date: Tue, 01 Jan 2019 19:47:24 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc

Hello,

On Fri, Dec 21, 2018 at 06:55:38PM +0000, Andy Smith wrote:
> Is it worth me moving this guest to a test host without pcid=0 to
> see if it crashes it, meanwhile keeping production hosts with
> pcid=0? And then putting pcid=0 on the test host to see if it
> survives longer?

I did move the suspect guest to a test host that does not have
pcid=0 and 10 days later it crashed too:

(XEN) ----[ Xen-4.10.3-pre  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    15
(XEN) RIP:    e008:[<ffff82d08033d5b5>] guest_4.o#shadow_set_l1e+0x75/0x6a0
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor (d7v0)
(XEN) rax: ffff82e07b2e69c0   rbx: 8000003d9734e027   rcx: 0000000000000000
(XEN) rdx: ffff82e000000000   rsi: ffff81c4003dfa70   rdi: 00000000ffffffff
(XEN) rbp: 0000000003d9734e   rsp: ffff83400e2afbd8   r8:  0000000003d93187
(XEN) r9:  0000000000000000   r10: ffff8300789f2000   r11: 0000000000000000
(XEN) r12: 8000003d9734e027   r13: ffff833f5be74000   r14: 0000000003d9734e
(XEN) r15: ffff81c4003dfa70   cr0: 0000000080050033   cr4: 0000000000372660
(XEN) cr3: 0000003f56c31000   cr2: ffff81c4003dfa70
(XEN) fsb: 00007f9de67fc700   gsb: ffff88007f200000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08033d5b5> (guest_4.o#shadow_set_l1e+0x75/0x6a0):
(XEN)  0f 20 0f 85 23 01 00 00 <4d> 8b 37 4c 39 f3 0f 84 97 01 00 00 49 89 da 89
(XEN) Xen stack trace from rsp=ffff83400e2afbd8:
(XEN)    0000003d9734e000 0000000003d93187 0000000000000000 ffff833f00000002
(XEN)    ffff8300789f2000 ffff833f5be74000 ffff81c4003dfa70 ffff83400e2afef8
(XEN)    0000000003d93187 0000000003d9734e ffff8300789f2000 ffff82d08033f6f2
(XEN)    ffff833deb418e08 ffff88007bf4e4d8 ffff833f5be74600 0000000003d9734e
(XEN)    0000000003d9734e 0000000003d9734e ffff83400e2afd70 ffff83400e2afd20
(XEN)    000ffff88007bf4e 0000000000000078 ffff82d0805802c0 000000028033c294
(XEN)    0000000000000880 0000000000000008 0000000000000ef8 ffff82d0805802c0
(XEN)    0000000003d93187 ffff88007bf4e4d8 0000000000000a70 000000000000014e
(XEN)    ffff81c0e2001ef8 01ff82d000000000 8000003d9734e027 ffff82d000000000
(XEN)    ffff833f00000001 00000001789f2000 ffff83400e2affff ffff83400e2afd20
(XEN)    000000000000006f ffff88007bf4e4d8 0000003e11814067 0000003e11706067
(XEN)    0000003d9341f067 8010003d9734e067 0000000003e1310f 0000000003e11814
(XEN)    0000000003e11706 0000000003d9341f 0000000000000005 ffff82d0803265b4
(XEN)    ffff82e07b25e140 ffff833f5be74000 ffff82e07ead8620 0000000100000001
(XEN)    0dff834003e1310f 0000000d00000010 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d0802845d3
(XEN)    000000000000000d ffff82d08032a359 ffff833f5be74000 000000010000000d
(XEN)    ffff82d080354913 ffff82d080354907 ffff82d080354913 ffff82d080354907
(XEN)    ffff82d080354913 ffff82d080354907 ffff82d080354913 ffff82d080354907
(XEN)    ffff82d080354913 ffff82d080354907 ffff82d080354913 ffff82d080354907
(XEN) Xen call trace:
(XEN)    [<ffff82d08033d5b5>] guest_4.o#shadow_set_l1e+0x75/0x6a0
(XEN)    [<ffff82d08033f6f2>] guest_4.o#sh_page_fault__guest_4+0x8f2/0x2060
(XEN)    [<ffff82d0803265b4>] shadow_alloc+0x1d4/0x380
(XEN)    [<ffff82d0802845d3>] get_page+0x13/0xe0
(XEN)    [<ffff82d08032a359>] sh_resync_all+0xb9/0x2b0
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d0802a1842>] do_page_fault+0x1a2/0x4e0
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d080354907>] handle_exception+0x8f/0xf9
(XEN)    [<ffff82d080354913>] handle_exception+0x9b/0xf9
(XEN)    [<ffff82d0803549d9>] x86_64/entry.S#handle_exception_saved+0x68/0x94
(XEN) 
(XEN) Pagetable walk from ffff81c4003dfa70:
(XEN)  L4[0x103] = 8000003f56c31063 ffffffffffffffff
(XEN)  L3[0x110] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 15:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff81c4003dfa70
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

The test host is slightly different hardware to the others: Xeon
E5-1680v4 on there as opposed to Xeon D-1540 previously.

Test host is now running with pcid=0 to see if that helps. The
longest this guest has been able to run so far without crashing the
host is 14 days.

Cheers,
Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.