[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!



On Tue, Dec 28, 2010 at 06:14:16AM +0800, Teck Choon Giam wrote:
>    On Mon, Dec 27, 2010 at 11:53 PM, Konrad Rzeszutek Wilk
>    <[1]konrad.wilk@xxxxxxxxxx> wrote:
> 
>      On Sun, Dec 26, 2010 at 04:16:16PM +0800, Teck Choon Giam wrote:
>      > Hi,
>      >
>      > Information: CentOS 5.5 x86_64
>      > dom0: latest xen/stable-2.6.32.x pvops git commit
>      > 75cc13f5aa29b4f3227d269ca165dfa8937c94fe
>      > xen version: xen-4.0.2-rc1-pre from xen-4.0-testing.hg changeset 21422
>      >
>      > While doing LVM snapshot for migration and get the following:
>      >
>      > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------
>      > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860!
>      > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP
>      > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev
>      > Dec 26 15:58:29 xen01 kernel: CPU 0
>      > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE
>      iptable_nat
>      > nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
>      ipt_REJECT
>      > xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp
>      be2iscsi
>      > iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi
>      > scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output
>      sbs
>      > sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac
>      > parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core
>      > iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror
>      dm_region_hash
>      > dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd
>      > uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>      > Dec 26 15:58:29 xen01 kernel: Pid: 27998, comm: udevd Not tainted
>      > 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP
>      > Dec 26 15:58:29 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>]
>      > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59
>      > Dec 26 15:58:29 xen01 kernel: RSP: e02b:ffff88003bc3bdc8  EFLAGS:
>      00010282
>      > Dec 26 15:58:29 xen01 kernel: RAX: 00000000ffffffea RBX:
>      0000000000017605
>      > RCX: 00000000000000bb
>      > Dec 26 15:58:29 xen01 kernel: RDX: 00000000deadbeef RSI:
>      00000000deadbeef
>      > RDI: 00000000deadbeef
>      > Dec 26 15:58:29 xen01 kernel: RBP: ffff88003bc3bde8 R08:
>      0000000000000028
>      > R09: ffff880000000000
>      > Dec 26 15:58:29 xen01 kernel: R10: 00000000deadbeef R11:
>      00007fdb5665e600
>      > R12: 0000000000000003
>      > Dec 26 15:58:30 xen01 kernel: R13: 0000000000017605 R14:
>      ffff880012ee0780
>      > R15: 00007fdb56224268
>      > Dec 26 15:58:30 xen01 kernel: FS:  00007fdb56fed710(0000)
>      > GS:ffff88002804f000(0000) knlGS:0000000000000000
>      > Dec 26 15:58:30 xen01 kernel: CS:  e033 DS: 0000 ES: 0000 CR0:
>      > 000000008005003b
>      > Dec 26 15:58:30 xen01 kernel: CR2: 00007fdb56224268 CR3:
>      000000003addb000
>      > CR4: 0000000000002660
>      > Dec 26 15:58:30 xen01 kernel: DR0: 0000000000000000 DR1:
>      0000000000000000
>      > DR2: 0000000000000000
>      > Dec 26 15:58:30 xen01 kernel: DR3: 0000000000000000 DR6:
>      00000000ffff0ff0
>      > DR7: 0000000000000400
>      > Dec 26 15:58:30 xen01 kernel: Process udevd (pid: 27998, threadinfo
>      > ffff88003bc3a000, task ffff880012ee0780)
>      > Dec 26 15:58:30 xen01 kernel: Stack:
>      > Dec 26 15:58:30 xen01 kernel:  0000000000000000 0000000000424121
>      > 000000013f00ae20 0000000000017605
>      > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be08 ffffffff8100e07c
>      > ffff88003a3c2580 ffff880034bb6588
>      > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be18 ffffffff8100e0af
>      > ffff88003bc3be58 ffffffff810a402f
>      > Dec 26 15:58:31 xen01 kernel: Call Trace:
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff8100e07c>]
>      > xen_alloc_ptpage+0x64/0x69
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff8100e0af>]
>      xen_alloc_pte+0xe/0x10
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff810a402f>]
>      __pte_alloc+0x70/0xce
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff810a41cd>]
>      > handle_mm_fault+0x140/0x8b9
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff810c4cd1>] ?
>      __fput+0x1cb/0x1da
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff8131be4d>]
>      > do_page_fault+0x252/0x2e2
>      > Dec 26 15:58:31 xen01 kernel:  [<ffffffff81319dd5>]
>      page_fault+0x25/0x30
>      > Dec 26 15:58:31 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48
>      21 c2
>      > 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9
>      c7 ff
>      > ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb
>      74 5b
>      > 48
>      > Dec 26 15:58:31 xen01 kernel: RIP  [<ffffffff8100cb5b>]
>      > pin_pagetable_pfn+0x53/0x59
>      > Dec 26 15:58:31 xen01 kernel:  RSP <ffff88003bc3bdc8>
>      > Dec 26 15:58:31 xen01 kernel: ---[ end trace 540bcf6f0170242d ]---
>      >
>      > Triggered BUG() in line 1860:
>      >
>      > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
>      > {
>      >         struct mmuext_op op;
>      >         op.cmd = cmd;
>      >         op.arg1.mfn = pfn_to_mfn(pfn);
>      >            if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
>      >                    BUG(); <<THIS ONE?
> 
>      Yup.
>      > }
>      >
>      > Any idea?
> 
>      Do you get to see this every time you do LVM migrate?
> 
>    My method of migration is to do below without using xen xm migrate... ...
> 
>    1. LVM snapshot of domU
>    2. mount LVM snapshot domU
>    3. rsync over to the target host
>    4. umount LVM snapshot domU
>    5. remove LVM snapshot domU
>    6. shutdown domU
>    7. mount LVM domU
>    8. rsync mounted LVM domU over to target host
>    9. start domU in the new target host
> 
>    Actually server will crash if I do daily LVM snapshot to backup domUs not
>    just for migration.  And this happen almost daily :(
> 
>    Method of backup domU:
> 
>    1. LVM snapshot domU
>    2. mount LVM snapshot domU
>    3. rsync to disk as backup
>    4. umount LVM snapshot domU
>    5. remove LVM snapshot domU
> 
>    Even I use combined ionice and nice... ... yet still crash... ...
> 
>    Maybe it is time to roll back to XenLinux 2.6.18.8... ...
> 

It would be very good to track this down and get it fixed.. 
hopefully you're able to help a bit and try some things to debug it.

Konrad maybe has some ideas to try.. 

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.