[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Xen netback module crash


  • To: xen-users@xxxxxxxxxxxxx
  • From: Wouter de Geus <benv-xensource.com@xxxxxxxxxxxxx>
  • Date: Mon, 30 Jun 2014 15:02:25 +0200
  • Comment: DKIM? See http://www.dkim.org
  • Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
  • Delivery-date: Mon, 30 Jun 2014 13:02:54 +0000
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=junerules.com; h=Received:Received:Date:From:To:Subject:Message-ID:Mail-Followup-To:MIME-Version:Content-Type:Content-Disposition:Organization:X-URL:User-Agent; b=acwHIYRpQlilgMwdyREFIm86ks5zzFKO2Z+EvhFNwx/OnsBaxNZmMX6C/s9AYs gdoO2kdRXMTtRGRM/ispavBMzb9wGaiOrY62jwyfz+7R3jSqy5MOoPuz5vBNQaqS /073enGJw8RFc1+j1z7OgZOhf1V65aqtJ/TdGSwrxoN1o=;
  • List-id: Xen user discussion <xen-users.lists.xen.org>
  • Mail-followup-to: xen-users@xxxxxxxxxxxxx

Hej folks,

I have a new machine that I just transferred a few domU's to, and after a few 
minutes the netback module crashed.
The system is running a freshly compiled Xen 4.3 from git (stable-4.3 branch at 
commit 8ce2638a), custom kernel 3.15.2 (also tested 3.15.1 after this happened, 
same story).
====
[  427.622098] ------------[ cut here ]------------
[  427.658465] kernel BUG at drivers/net/xen-netback/netback.c:629!
[  427.674362] invalid opcode: 0000 [#1] SMP 
[  427.689946] Modules linked in:
[  427.705249] CPU: 0 PID: 1295 Comm: mail0.0-guest-r Not tainted 3.15.2-Desman 
#1
[  427.735406] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 
12/05/2013
[  427.764838] task: ffff8801760a4d10 ti: ffff88016f3e0000 task.ti: 
ffff88016f3e0000
[  427.793956] RIP: e030:[<ffffffff81974ef0>]  [<ffffffff81974ef0>] 
xenvif_rx_action+0x9d0/0x9e0
[  427.822402] RSP: e02b:ffff88016f3e3da8  EFLAGS: 00010297
[  427.836276] RAX: 0000000000000000 RBX: ffff88016d663a90 RCX: ffffc900114d2200
[  427.849979] RDX: 0000000000000013 RSI: ffff88017435cf00 RDI: 00000000004f8e7c
[  427.863615] RBP: 0000000000002df7 R08: 0000000000000000 R09: 0000000000000001
[  427.877038] R10: ffffea0005af3f00 R11: ffff88016bcfc000 R12: ffff88016f3e3df4
[  427.890328] R13: ffff8800769d0800 R14: ffff88017435cf00 R15: ffff8800769d0800
[  427.903385] FS:  00007f5a9203a900(0000) GS:ffff880181000000(0000) 
knlGS:0000000000000000
[  427.928937] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  427.941671] CR2: 00007f692d214000 CR3: 000000007ad6d000 CR4: 0000000000042660
[  427.954352] Stack:
[  427.966713]  ffff88016f3e3df4 0000000000000000 00000000ffffffff 
0000001200000e78
[  427.991160]  ffff88016d663a90 ffff8800769d0800 00002df700000000 
ffff8800769db150
[  428.014661]  ffff88016d663a80 0000000081008249 ffff88016f3e3df8 
ffff88016f3e3df8
[  428.037279] Call Trace:
[  428.048184]  [<ffffffff81976da1>] ? xenvif_kthread_guest_rx+0xb1/0x260
[  428.059190]  [<ffffffff8110c620>] ? prepare_to_wait_event+0xf0/0xf0
[  428.070261]  [<ffffffff81976cf0>] ? xenvif_stop_queue+0x60/0x60
[  428.081319]  [<ffffffff810ee8f8>] ? kthread+0xb8/0xd0
[  428.092165]  [<ffffffff81007b8c>] ? xen_clocksource_read+0x1c/0x20
[  428.102883]  [<ffffffff810ee840>] ? kthread_create_on_node+0x180/0x180
[  428.113809]  [<ffffffff81e502cc>] ? ret_from_fork+0x7c/0xb0
[  428.123975]  [<ffffffff810ee840>] ? kthread_create_on_node+0x180/0x180
[  428.134030] Code: ff ff 48 8b 5c 24 20 e9 4f fa ff ff c6 44 24 1c 00 e9 e2 
fb ff ff 83 c8 03 e9 8f fd ff ff 45 31 c0 b8 04 00 00 00 e9 67 fd ff ff <0f> 0b 
0f 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 44 8b 97 a0 00 
[  428.167201] RIP  [<ffffffff81974ef0>] xenvif_rx_action+0x9d0/0x9e0
[  428.177817]  RSP <ffff88016f3e3da8>
[  428.189577] ---[ end trace f98b0fe37eb7e486 ]---
=====
The above crash happened 3 or 4 times while I tested our various kernels, so 
it's definitely reproducable.
This started happening after I moved our mailserver/router domain to this 
machine, before that it was only running a Windows 2012r2 HVM instance and 
nothing else.
(even with some heavy network traffic -- copied over 60GBs of data through that 
domU -- it was stable).

I moved back the router domU to the old machine for now and now it seems fine 
again.

Is this a known xen bug? Any suggestions on how to fix this?

The mail/router domU was running the same 3.15.2 (and later 3.15.1) kernel as 
the dom0.
Xen is started with UEFI with the following config:
====
[global]
default=xen

[xen]
options=noreboot dom0_mem=4G,max:4G no-bootscrub loglvl=all guest_loglvl=all 
console=vga acpi_rsdp_passthrough=1
kernel=vmlinuz-3.15.2 root=/dev/sda2 ro earlyprintk=xen nomodeset 
acpi_rsdp=0x7deb2000
====

Thanks for reading. If more info is needed just let me know :)

Regards,

Wouter.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.