[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Dave Hunter <dave@xxxxxxxxxx>
  • Date: Wed, 06 Apr 2011 08:01:55 +1000
  • Delivery-date: Tue, 05 Apr 2011 15:02:09 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi guys,

This thread has gone quiet for a while and I was wondering if a solution
had been found?

I'm currently running the packaged version of Xen 4.0.1 in Debian
Squeeze and everything runs well, except for the random crashing when
using LVM.

I use LVM for the disk partitions, and use live snapshots as part of our
backup routine.  That is, create snapshot -> mount snapshot -> rsync ->
umount snapshot -> remove snapshot.

Cheers,

Dave Hunter.

On Mon, 2011-03-28 at 20:29 +0800, Teck Choon Giam wrote:
> On Mon, Mar 28, 2011 at 7:37 PM, Andreas Olsowski
> <andreas.olsowski@xxxxxxxxxxx> wrote:
> >
> >>  - turn on CONFIG_DEBUG_PAGEALLOC
> >>  - turn on CONFIG_DEBUG_LIST
> >>  - turn on CONFIG_DEBUG_KMEMLEAK
> >>  - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG
> >>  - turn on CONFIG_SLUB_DEBUG_ON
> >
> > After i enabled those options (i dont use SLUB, i use SLAB) i do no longer
> > encounter any errors.
> >
> > I completed 1000 loops of snapshot/mount/umoun/removesnapshot.
> 
> Did you try with just CONFIG_DEBUG_PAGEALLOC=y and leave the rest
> unchange of your config?  My testing all narrow down to
> CONFIG_DEBUG_PAGEALLOC=y to prevent this BUG.
> 
> >
> >
> > Without those options in 2.6.32.35 i hit a different bug earlier today:
> >
> > But you really have to be patient to see some output, because lvremove will
> > hang quite a while:
> > (a "while" beeing the a a roughly the time it takes for: wait 5 min for
> > error, leave office, get coffee, smoke cigarette, goto restroom, return to
> > office, finally see error)
> >
> > kernel: BUG: unable to handle kernel paging request
> > ...
> > kernel: RIP  [<ffffffff8100f2bf>] xen_set_pmd+0x2f/0xb0
> > syslog/dmesg output is attached as crash.2.6.32.35-xen_01 or available at:
> > http://pastebin.com/Ad8MhUzD
> 
> I hit this before:
> 
> # grep 'xen_set_pmd' /var/log/messages*
> /var/log/messages:Mar 27 09:31:14 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:06:10 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 15:18:57 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: IP:
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: RIP:
> e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> /var/log/messages.1:Mar 23 11:00:17 xen05 kernel: RIP
> [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
> 
> But unable to reproduce when CONFIG_DEBUG_PAGEALLOC=y.
> 
> >
> > After that happened i did a kernel recompile without rebooting the machine
> > first and encoundeterd system_call_fastpath as last call once more as shown
> > in crash.2.6.32.35-xen_02 or http://pastebin.com/kB38W5mp
> 
> I hit this at least once but unable to when CONFIG_DEBUG_PAGEALLOC=y:
> 
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: ------------[ cut here
> ]------------
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: kernel BUG at
> arch/x86/xen/mmu.c:1872!
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: invalid opcode: 0000 [#1] SMP
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: last sysfs file:
> /sys/block/sdd/dev
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: CPU 2
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Modules linked in:
> ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter
> ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6
> cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi
> dm_multipath scsi_dh video backlight output sbs sbshc power_meter
> hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp
> parport tg3 libphy sg ide_cd_mod cdrom serio_raw button tpm_tis tpm
> tpm_bios i2c_i801 i2c_core shpchp iTCO_wdt pcspkr dm_snapshot dm_zero
> dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod
> raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Pid: 5874, comm:
> lvcreate Not tainted 2.6.32.35-4.xen.pvops.choon.centos5 #1 PowerEdge
> 860
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP:
> e030:[<ffffffff8100cb5b>]  [<ffffffff8100cb5b>]
> pin_pagetable_pfn+0x53/0x59
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP:
> e02b:ffff8800303d1c28  EFLAGS: 00010282
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RAX: 00000000ffffffea
> RBX: 000000000003032d RCX: 0000000000000181
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RDX: 00000000deadbeef
> RSI: 00000000deadbeef RDI: 00000000deadbeef
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RBP: ffff8800303d1c48
> R08: 0000000000000968 R09: ffff880000000000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: R10: 00000000deadbeef
> R11: ffff8800303d1d08 R12: 0000000000000003
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: R13: 000000000003032d
> R14: ffff880030360000 R15: 00007fd324a00000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: FS:
> 00007fd327d2e710(0000) GS:ffff880028089000(0000)
> knlGS:0000000000000000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: CS:  e033 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: CR2: 00000000004612f0
> CR3: 000000003a025000 CR4: 0000000000002660
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Process lvcreate (pid:
> 5874, threadinfo ffff8800303d0000, task ffff880030360000)
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Stack:
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  0000000000000000
> 00000000002027a9 000000013eb43318 000000000003032d
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c68
> ffffffff8100e07c ffff880032be05c0 ffff880032aa9928
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c78
> ffffffff8100e0af ffff8800303d1cb8 ffffffff810a4433
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Call Trace:
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8100e07c>]
> xen_alloc_ptpage+0x64/0x69
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8100e0af>]
> xen_alloc_pte+0xe/0x10
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a4433>]
> __pte_alloc+0x70/0xce
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a45d1>]
> handle_mm_fault+0x140/0x8b9
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a50c9>]
> __get_user_pages+0x37f/0x479
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a76ca>]
> __mlock_vma_pages_range+0xc0/0x16f
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8131c03f>]
> ? _spin_unlock_irqrestore+0x11/0x13
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a78db>]
> mlock_fixup+0x162/0x199
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a7989>]
> do_mlockall+0x77/0x8d
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff81139016>]
> ? security_capable+0x27/0x29
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a7ce2>]
> sys_mlockall+0x8f/0xb9
> /var/log/messages:Mar 27 17:04:39 xen05 kernel:  [<ffffffff81012ac2>]
> system_call_fastpath+0x16/0x1b
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: Code: 48 b8 ff ff ff
> ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2
> 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40
> f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP
> [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59
> /var/log/messages-Mar 27 17:04:39 xen05 kernel:  RSP <ffff8800303d1c28>
> /var/log/messages-Mar 27 17:04:39 xen05 kernel: ---[ end trace
> bf36c55d2ecd52e5 ]---
> 
> >
> >
> > Maybe this helps, but i think, if anything, this makes it worse as the debug
> > options actually supressed the problem that needs to be debugged.
> 
> True.  At least now we know/narrow down to just related to
> CONFIG_DEBUG_PAGEALLOC.  Maybe Konrad or Jeremy can have a closer look
> in the related codes... ...
> 
> Thanks.
> 
> Kindest regards,
> Giam Teck Choon
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.