[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143
>>> On 11.06.13 at 17:30, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote: > I think this is a more subtle bug. > I applied a debug patch (see attached) and with the help of it and the logs: > > (XEN) domain_page.c:160:d1 mfn (1ebe96) -> 6 idx: 32(i:1,j:0), branch:1 > (XEN) domain_page.c:166:d1 [0] idx=26, mfn=0x1ebcd8, refcnt: 0 > (XEN) domain_page.c:166:d1 [1] idx=12, mfn=0x1ebcd9, refcnt: 0 > (XEN) domain_page.c:166:d1 [2] idx=2, mfn=0x210e9a, refcnt: 0 > (XEN) domain_page.c:166:d1 [3] idx=14, mfn=0x210e9b, refcnt: 0 > (XEN) domain_page.c:166:d1 [4] idx=7, mfn=0x210e9c, refcnt: 0 > (XEN) domain_page.c:166:d1 [5] idx=10, mfn=0x210e9d, refcnt: 0 > (XEN) domain_page.c:166:d1 [6] idx=5, mfn=0x210e9e, refcnt: 0 > (XEN) domain_page.c:166:d1 [7] idx=13, mfn=0x1ebe97, refcnt: 0 > (XEN) Xen BUG at domain_page.c:169 > (XEN) ----[ Xen-4.3-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 3 > (XEN) RIP: e008:[<ffff82c4c01606a7>] map_domain_page+0x61d/0x6e1 > (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: ffff8300c68f9000 rcx: 0000000000000000 > (XEN) rdx: ffff8302125b2020 rsi: 000000000000000a rdi: ffff82c4c027a6e8 > (XEN) rbp: ffff8302125afcc8 rsp: ffff8302125afc48 r8: 0000000000000004 > (XEN) r9: 0000000000000004 r10: 0000000000000004 r11: 0000000000000001 > (XEN) r12: ffff83022e2ef000 r13: 00000000001ebe96 r14: 0000000000000020 > (XEN) r15: ffff8300c68f9080 cr0: 0000000080050033 cr4: 00000000000426f0 > (XEN) cr3: 0000000209541000 cr2: ffffffffff600400 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff8302125afc48: > (XEN) 00000000001ebe97 0000000000000000 0000000000000000 ffff830200000001 > (XEN) ffff8302125afcc8 ffff82c400000000 00000000001ebe97 000000080000000d > (XEN) ffff83022e2ef2d8 0000000000000286 ffff82c4c0127b6b ffff83022e2ef000 > (XEN) ffff82e003d7d2c0 ffff8302125afd60 00000000001ebe96 0000000000000000 > (XEN) ffff8302125afd38 ffff82c4c01373de 0000000000000000 ffffffffffffffff > (XEN) 0000000000000001 ffff8302125afd58 ffff83022e2ef2d8 0000000000000286 > (XEN) 0000000000000027 0000000000000000 0000000000001000 0000000000000000 > (XEN) 0000000000000000 00000000001ebe96 ffff8302125afd98 ffff82c4c01377c4 > (XEN) 0000000000000000 ffff820040017000 ffff82e003d7d2c0 00000000001ebe96 > (XEN) ffff8302125afd98 ffff830210ecf390 00000000fffffff4 ffff820040009010 > (XEN) ffff820040000f50 ffff83022e2f0c90 ffff8302125afe18 ffff82c4c0135929 > (XEN) 000000160000001e ffff820040000f50 0000000000000000 00000000001ebe96 > (XEN) 0000000000000000 0000000000000000 0000a2f6125afe28 ffff8302125afe00 > (XEN) 0000001675f02b51 ffff83022e2f0c90 ffff830210ecf390 0000000000000000 > (XEN) 0000000000000001 0000000000000065 ffff8302125afef8 ffff82c4c0136510 > (XEN) ffff830200001000 0000000000000000 ffff8302125afe90 255ece02125b2040 > (XEN) 00000003125afe68 00000016742667d1 ffff8302125b2100 0000003d52299000 > (XEN) ffff8300c68f9000 0000000001c9c380 ffff8302125b2100 ffff8302125b1808 > (XEN) 0000000000000004 0000000000000004 0000000000000000 0000000000000000 > (XEN) 000000000000a2f6 0000000000000000 00000000001ebe96 ffff82c4c0126e77 > (XEN) Xen call trace: > (XEN) [<ffff82c4c01606a7>] map_domain_page+0x61d/0x6e1 > (XEN) [<ffff82c4c01373de>] cli_get_page+0x15e/0x17b > (XEN) [<ffff82c4c01377c4>] tmh_copy_from_client+0x150/0x284 > (XEN) [<ffff82c4c0135929>] do_tmem_put+0x323/0x5c4 > (XEN) [<ffff82c4c0136510>] do_tmem_op+0x5a0/0xbd0 > (XEN) [<ffff82c4c022391b>] syscall_enter+0xeb/0x145 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) Xen BUG at domain_page.c:169 > (XEN) **************************************** > (XEN) > (XEN) Manual reset required ('noreboot' specified) > > It looks as if the path that is taken is: > > 110 idx = find_next_zero_bit(dcache->inuse, dcache->entries, > dcache->cursor); > 111 if ( unlikely(idx >= dcache->entries) ) > 112 { > > 115 /* /First/, clean the garbage map and update the inuse list. */ > 116 for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ ) > 117 { > 118 dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0); > 119 accum |= ~dcache->inuse[i]; > > Here computes the accum > 120 } > 121 > 122 if ( accum ) > 123 idx = find_first_zero_bit(dcache->inuse, dcache->entries) > > Ok, finds the idx (32), > 124 else > 125 { > .. does not go here. > 142 } > 143 BUG_ON(idx >= dcache->entries); > > And hits the BUG_ON(). > > But I am not sure if that is appropriate. Perhaps the BUG_ON was meant > as a check > for the loop (lines 128 -> 141) - in case it looped around and never > found an empty place. > But if that is the condition then that would also look suspect as it > might have found an > empty hash entry and the idx would still end up being 32. The BUG_ON() here is definitely valid - a few lines down, after the enclosing if(), we use it in ways that requires this to not have triggered. It basically tells you whether an in range idx was found, which apparently isn't the case here. As I think George already pointed out - printing accum here would be quite useful: It should have at least one of the low 32 bits set, given that dcache->entries must be at most 32 according to the data you already got logged. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |