[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] General protection fault in netback



On Fri, Feb 03, 2012 at 07:32:40PM +0300, Anton Samsonov wrote:
> I was experimenting with DomU redundancy and load balancing,
> and I think this GPF started to show up after a couple of DomUs
> with CARP and HAProxy were added that constantly generate
> a strong flow of network traffic by pinging target machines
> and each other as well. Or may be it is not related to CARP
> and pinging, but just depends on traffic volume: the more VMs
> added and running, the more chances that Dom0-DomU networking
> will collapse, the critical point being 8 guest domains, while I need 10.
> 
> I can't give exact steps to reproduce, as it happens randomly,
> usually without any correlated user activity, after several hours
> (or several minutes) of normal performance. But sometimes
> it happens not so long after a balancer's DomU startup or shutdown.
> After GPF happens, all VMs loose their networking connectivity.
> 
> Dom0 is openSUSE 12.1 on AMD64 (Linux 3.1.0-1.2-xen)

Do you get the same issue with a pv-ops dom0? So also 3.1, but from
kernel.org?

> with Xen version 4.1.2_05-1.9, which is patched as described
> in openSUSE bug 727081 (bugzilla.novell.com/show_bug.cgi?id=727081).
> Supposedly "offending" DomU is paravirtualized NetBSD 5.1.1
> for AMD64 with recompiled kernel (CARP enabled, no more changes).

What is CARP?
> Other VMs are openSUSE 11.4 and 12.1 for AMD64.
> 
> 
> Trace log in /var/log/messages always looks similar (varying digits
> replaced with asterisks ***):
> 
> 
> general protection fault: 0000 [#1] SMP
> CPU {core-number}
> Modules linked in: 8250 8250_pnp af_packet asus_wmi ata_generic
> blkback_pagemap blkbk blktap bridge btrfs button cdrom dm_mod
> domctl drm drm_kms_helper edd eeepc_wmi ehci_hcd evtchn fuse
> gntdev hid hwmon i2c_algo_bit i2c_core i2c_i801 i915
> iTCO_vendor_support iTCO_wdt linear llc lzo_compress mei(C)
> microcode netbk parport parport_pc pata_via pci_hotplug pcspkr
> ppdev processor r8169 rfkill serial_core [serio_raw] sg
> snd snd_hda_codec snd_hda_codec_hdmi snd_hda_codec_realtek
> snd_hda_intel snd_hwdep snd_mixer_oss snd_page_alloc snd_pcm
> snd_pcm_oss snd_seq snd_seq_device snd_timer soundcore
> sparse_keymap sr_mod stp thermal_sys uas usbbk usbcore
> usbhid usb_storage video wmi xenblk xenbus_be xennet zlib_deflate
> 
> Pid: {process-id}, comm: netback/{0/1} Tainted: G
>          C  3.1.0-1.2-xen #1 System manufacturer System Product Name/P8H67-M
> RIP: e030:[<ffffffff803e7451>]  [<ffffffff803e7451>]
> skb_release_data.part.47+0x61/0xc0
> RSP: e02b:ffff880******d40  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff880********0 RCX: ffff880******000
> RDX: {..RCX.+.0e80..} RSI: 00000000000000** RDI: 00***c**00000000
> RBP: {.....RBX......} R08: {..RCX.-.cff0..} R09: 0000000*********
> R10: 000000000000000* R11: {.task.+.0470..} R12: ffff880026a51000
> R13: ffff880********0 R14: ffffc900048****0 R15: 0000000000000001
> FS:  00007f*******7*0(0000) GS:ffff880******000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000***********0 CR3: 0000000******000 CR4: 0000000000042660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process netback/{0/1} (pid: {process-id}, threadinfo ffff880******000,
> task ffff880********0)
> Stack:
>  0000000000000000 {.....RBX......} 0000000000000000 ffffffff803e7511
>  {.....RBX......} ffffffffa0***d2c {.....task.....} {thread.+.1e00.}
>  {thread.+.1db0.} {.R14.-.22a40..} ffffc9000000000* 0000000000000000

Hm, that is a pretty neat stack output. Wonder which patch of theirs
does that.

> Call Trace:
>  [<ffffffff803e7511>] __kfree_skb+0x11/0x20
>  [<ffffffffa0***d2c>] net_rx_action+0x66c/0x9c0 [netbk]
>  [<ffffffffa0***72a>] netbk_action_thread+0x5a/0x270 [netbk]
>  [<ffffffff8006438e>] kthread+0x7e/0x90
>  [<ffffffff8050f814>] kernel_thread_helper+0x4/0x10
> Code: 48 8b 7c 02 08 e8 90 69 cf ff 8b 95 d0 00 00 00
>   48 8b 8d d8 00 00 00 48 01 ca 0f b7 02 39 c3 7c
>   d1 f6 42 0c 10 74 1e 48 8b 7a 30
> RIP  [<ffffffff803e7451>] skb_release_data.part.47+0x61/0xc0
>  RSP <ffff880******d40>
> ---[ end trace **************** ]---
> 
> 
> Preceeding and subsequent messages don't seem to be related with GPF,
> time gap is from minutes to half an hour or even more. But if this could give
> some insight, I will post them, too.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.