[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT



Thank you the details.
 
There is no "PFN compression on bits" on Xen boot output. I add some extra log, and
found it returned from xen/arch/x86/x86_64/mm.c, line 183. Please refer to the boot
log below.
 
I may can add some assertions on the pages address after chunk merging.
Thank you for mails your forwarded. I will go through all of them later.
 
--------------------------pfn_pdx_hole_setup-----------------
164 void __init pfn_pdx_hole_setup(unsigned long mask)
 165 {
 166     unsigned int i, j, bottom_shift, hole_shift;
 167     printk("-------in pfn\n");
 168
 169     for ( hole_shift = bottom_shift = j = 0; ; )
 170     {
 171         i = find_next_zero_bit(&mask, BITS_PER_LONG, j);
 172         j = find_next_bit(&mask, BITS_PER_LONG, i);
 173         if ( j >= BITS_PER_LONG )
 174             break;
 175         if ( j - i > hole_shift )
 176         {
 177           &nb sp; hole_shift = j - i;
 178             bottom_shift = i;
 179         }
 180     }
 181     if ( !hole_shift ){
 182         printk("-------hole shift returned\n");
 183         return;
 184     }
 185     printk("-------in pfn middle \n");
 186
 187     printk(KERN_INFO "PFN compression on bits %u...%u\n",
 188            bottom_shift, bottom_shift + hole_shift - 1);
 189     printk("----PFN compression on bits %u...%u\n",
 190            bottom_shift, bottom_shift + hole_shift - 1);
 191
 192     pfn_pdx_hole_shift  = hole_shift;
 193     pfn_pdx_bottom_mask = (1UL << bottom_shift) - 1;
 194     ma_va_bottom_mask   = (PAGE_SIZE << bottom_shift) - 1;
 195     pfn_hole_mask       = ((1UL << hole_shift) - 1) << bottom_shift;
 196     pfn_top_mask        = ~(pfn_pdx_bottom_mask | pfn_hole_mask);
 197     ma_top_mask         = pfn_top_mask << PAGE_SHIFT;
 198 }
 
------------------------------------------xen boot log---------------------
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009a800 (usable)
(XEN)  000000000009a800 - 00000000000a0000 (reserved)
(XEN)  00000000000e4bb0 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000bf790000 (usable)
(XEN)  00000000bf790000 - 00000000bf79e000 (ACPI data)
(XEN)  00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
(XEN)  00000000bf7d0000 - 00000000bf7e0000 (reserved)
(XEN)  00000000bf7ec000 - 00000000c0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000fff00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000640000000 (usable)
(XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
(XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT       97)
(XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT&nb sp;      97)
(XEN) ACPI: DSDT BF7904B0, 4D6A (r2  CTSAV CTSAV122      122 INTL 20051117)
(XEN) ACPI: FACS BF79E000, 0040
(XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT       97)
(XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG  20091123 MSFT       97)
(XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT       97)
(XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT         1 INTL        1)
(XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET  20091123 MSFT       97)
(XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm    CpuPm       12 INTL 20051117)
(XEN) --------------844
(XEN) ---------srat enter
(XEN) ---------prepare en ter into pfn
(XEN) -------in pfn
(XEN) -------hole shift returned
(XEN) --------------849
(XEN) System RAM: 24542MB (25131224kB)
(XEN) Domain heap initialised DMA width 31 bits
 
 
> Date: Tue, 31 Aug 2010 15:49:29 +0100
> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> From: keir.fraser@xxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> CC: JBeulich@xxxxxxxxxx
>
> Do you have a line in Xen boot output that starts "PFN compression on bits"?
> If so what does it say?
>
> My suspicion is that Jan Beulich's patches to implement a consolidated page
> array for sparse memory maps has broken the assumption in some Xen code
> that:
> page_to_mfn(mfn_to_page(x)+y) == x+y, for all valid mfns x, and all y up to
> some pretty big limit.
>
> Looking in free_heap_pages() I see we do a whole bunch of chunk merging in
> our buddy allocator, doing arithmetic on variable 'pg' to find neigbouring
> chunks. It's a bit dodgy I suspect.
>
> I'm cc'ing Jan to see what we can get away with in doing arithmet ic on
> page_info pointers. What's the guaranteed smallest aligned contiguous ranges
> of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent
> page_info structs relate to adjacent MFNs)
>
> If this is the problem I'm pretty sure we can come up with a patch quite
> easily, but depending on the answer to my above question to Jan, we may need
> to do some code auditing.
>
> -- Keir
>
> On 31/08/2010 14:49, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
> > Hi Keir:
> >
> > Thank you for correcting my mistakes.
> > Here is the lastest panic and its objdump.
> > I am not familiar with assemble language and those regigsters usage.
> > I will try to spend some other time to get more understandings.
> > What's your opionion?
> > btw, the memtest is still running, so far so good, thanks.
> >
> > ---- --------------objdump-----------------------------------------------------
> > -------------------
> > 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx
> > 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx
> > 179 }
> > 180 static inline void
> > 181 page_list_del(struct page_info *page, struct page_list_head *head)
> > 182 {
> > 183 struct page_info *next = pdx_to_page(page->list.next);
> > 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax
> > 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax
> > 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187
> > ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx
> > 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545
> > <free_heap_pages+0x405>
> > 189 struct page_info *prev = pdx_to_page(page->list.prev);
> > 1 90 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx
> > 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx
> > 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx
> > 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp &nbs p; 0x8(%rcx),%rbx
> > 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580
> > <free_heap_pages+0x440>
> > 195
> > 196 if ( !__page_list_del_head(page, head, next, prev) )
> > 197 {
> > 198 next->list.prev = page->list.prev;
> > 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax)
> > 200 prev->list.next = page->list.next;
> > 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax
> > &nbs p;
> > 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx)
> > 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13
> > 204 ffff82c4801153cd:<++41 83 c4 01 & lt;++add $0x1,%r12d
> > 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp ; $0x12,%r12d
> > 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be
> > <free_heap_pages+0x37e>
> > 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp
> > 208 ffff82c4801153e2:<++7d 00 00
> > 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx
> > 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi
> >
> >
> > ------------------------------------------------------------------------------
> > ---------------------
> > blktap_sysfs_create: adding attributes for dev ffff880239496c00
> > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]----
> > (XEN) CPU: 2
> > (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor
> > (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20
> > (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802
> > (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000
> > (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282
> > (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2
> > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0
> > (XEN) cr3: 000000008da54000 cr2: ffff83 15ffffffe4
> > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008
> > (XEN) Xen stack trace from rsp=ffff83023ff37cb8:
> > (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000
> > (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000
> > (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060
> > (XEN) ffff83060a 3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8
> > (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8
> > (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018
> > (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009
> > (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000
> > (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0
> > (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce
> > (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096
> > (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031
> > (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30
> > (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4
> > (XEN) 000000004523af44 0000000000000000 000000004 523b158 0000000000000000
> > (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8
> > (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100
> > (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000
> > (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 000000 0000000009
> > (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
> > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
> > (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530
> > (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280
> > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0
> > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000
> > (XEN) [<ffff82c4801447da> ] __find_next_bit+0x6a/0x70
> > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
> > (XEN)
> > (XEN) Pagetable walk from ffff8315ffffffe4:
> > (XEN) L4[0x106] = 00000000bf569027 5555555555555555
> > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
> > (XE N)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 2:
> > (XEN) FATAL PAGE FAULT
> > (XEN) [error_code=0002]
> > (XEN) Faulting linear address: ffff8315ffffffe4
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> >
> > ------------------------------------------------------------------------------
> > ---------------------
> >> Date: Mon, 30 Aug 2010 14:16:09 +0100
> >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> >> From: keir.fraser@xxxxxxxxxxxxx
> >> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >>
> >> On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> >>
> >>> Appreciate for the quick response.
> >>>
> >>> Actually I have done some decode on the backtrace last Friday.
> >>> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms"
> >>> (please see below). It looks like the bug happened on the domain page list
> >>
> >> ffff82c4801153c3 isn't the start of an instruction in your below
> >> disassembly. Hence you didn't disassemble exactly the build of Xen which
> >> crashed. It needs to be exactly the same image.
> >>
> >> -- keir
> >>
> >> & gt; travels, which is beyond my understanding. Since in my understandi ng,
> >>> those domain pages come from kernel memory zone, they are always
> >>> reside in the physical memory, and the address shouldn't have the chance
> >>> to be changed, right?
> >>> If so, what is the relationship between all those panic and free_heap_pages?
> >>>
> >>> Several servers (at least 3) experienced the same panic on the same test.
> >>> Those servers have the identical hardware, kernel and xen configuration.
> >>> Right now, on one server, memtest is running, shall be finished in a few
> >>> hours.
> >>> (24G memory)
> >>>
> >>> ----------------------------------------------------------------------------
> >>> --
> >>> ------
> >>> 169 static inline void
> >>> 170 page_list_del(struct page_info *page, struct page_list_he ad *head)
> >>> 171 {
> >>> 172 struct page_info *next = p dx_to_page(page->list.next);
> >>> 173 struct page_info *prev = pdx_to_page(page->list.prev);
> >>> 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi
> >>> 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx
> >>> 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax
> >>> # ffff82c4803764c0 <_heap>
> >>> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx
> >>> 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx
> >>> 179 }
> >>> 180 static inline void
> >>> 181 page_list_del(struct page_info *page, struct page_list_head *head)
> >>> 182 {
> >>> 183 struct page_info *next = pdx_to_page(page->list.next);
> >>> 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax
> >>> 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax
> >>> 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%r ax
> >>> 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx
> >>> 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575
> >>> <free_heap_pages+0x405>
> >>> 189 struct page_info *prev = pdx_to_page(page->list.prev);
> >>> 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx
> >>> 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx
> >>> 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx
> >>> 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx
> >>> 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0
> >>> <free_heap_pages+0x440>
> >& gt;> 195
> >>> 196 if ( !__page_list_del_head(page, head, next, prev) )
> >>> 197 {
> >>> 198
> >>> ----------------------------------------------------------------------------
> >>> --
> >>> ------
> >>>
> >>>> Date: Mon, 30 Aug 2010 10:02:05 +01 00
> >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> >>>> From: keir.fraser@xxxxxxxxxxxxx
> >>>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>>
> >>>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> >>>>
> >>>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is
> >>>>> not a valid page address.
> >>>>> I printted pages of the domain in assign_pages, wh ich all looks like
> >>>>> ffff82f60bd64000, at least
> >>>>> ffff82f60 is the same.
> >>>>
> >>>> Yes, well you may not be crashing on a supposed page address. Certainly the
> >>>> page pointer that relinquish_memory() is working on, and passed to
> >>>> put_page->free_domheap_pages is valid enough to not cause any of those
> >>>> functions to crash when dereferenci ng it. At the moment you really have no
> >>>> idea what is causing free_heap_pages() to crash.
> >>>>
> >>>>> A bit of lost direction to go further. Thanks.
> >>>>
> >>>> You need to find out which line of code in free_heap_pages() is crashing,
> >>>> and what variable it is trying to dereference when it crashes. You have a
> >>>> nice backtrace with an EIP value, so you can 'objdump -d xen-syms' and
> >>>> search for the EIP in the disassembly. If you have a debug build of Xen you
> >>>> can even do 'objdump -S xen-syms' and have the disassembly annotated with
> >>>> corresponding source lines.
> >>>>
> >>>> Have you seen this on more than one physical machine? If not, have you run
> >>>> memtest on the offending machine?
> >>>>
> >>>> -- Keir
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.