[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Hi, > > > > While doing LVM snapshot for migration and get the following: > > > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at > > > arch/x86/xen/mmu.c:1860! > > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > > [...] > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 > [<ffffffff8144ff65>] page_fault+0x25/0x30 > [...] > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) > for mfn 41114f (pfn d514f) > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f Looking into the code, the Dom0 code ist attempting to pin what it thins is a "PGT_l1_page_table", however the hypervisor returns -EINVAL because it actually is a "PGT_writable_page". After a few hours I managed to catch the crash while the offending process is being straced. However the results where totally inconclusive, because the last lines before the crash are: 16576 open("/lib/multipath/libcheckdirectio.so", O_RDONLY) = 4 16576 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\v\0\0\0\0\0\0"..., 832) = 832 16576 fstat(4, {st_mode=S_IFREG|0644, st_size=9344, ...}) = 0 16576 mmap(NULL, 2104672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7fa6b36f6000 16576 mprotect(0x7fa6b36f8000, 2093056, PROT_NONE) = 0 16576 mmap(0x7fa6b38f7000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1000) = 0x7fa6b38f7000 16576 close(4) = 0 A non-crashing execution would have continued with: 16667 open("/etc/ld.so.cache", O_RDONLY) = 4 16667 fstat(4, {st_mode=S_IFREG|0644, st_size=21739, ...}) = 0 16667 mmap(NULL, 21739, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f237de56000 16667 close(4) = 0 16667 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 16667 open("/lib/libaio.so.1", O_RDONLY) = 4 [...] Which means that it crashed during the dynamic loading of a plugin shared library and not while interacting with the device mapper. (also, the device being investigated was /dev/sde and not some dm device) This leads me to believe that some device-mapper shared library has a particular memory layout that tends to trigger this crash and it has nothing to do with any device-mapper code at all. Also, the crash seems to be timing-sensitive, so it might also be a race condition of some sort. (on a side-note: this is a 24-core machine (!) and the kernel has happens to have full preemption enabled). I am trying to understand the code a bit. Can someone explain to me what xen_alloc_ptpage is doing. > /* This needs to make sure the new pte page is pinned iff its being > attached to a pinned pagetable. */ > [...] > if (PagePinned(virt_to_page(mm->pgd))) { > [...] > pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn); I must admit I don't know very much about memory handling in linux (so please excuse me if I am interpreting total nonsense into this here, still I'm intigued and would like to understand it a bit better), but isn't `mm->pgd' supposed to point to the L1 page table and `pfn', being a pte page a 3rd/4th level page? Is this a code path that is exercised a lot? Thanks, Christophe _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |