[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
On 02/11/2011 11:00 AM, Kay, Allen M wrote: > The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops > dom0. No, the function changed name, but the concept is the same.. > I did find it in Konrad's tree at > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git. > > So is this a problem for 2.6.32.27 stable tree? If so, which pvops dom0 tree > should I be using? I *just* pushed .32.27 and haven't had a chance to test it. The xen/stable-2.6.32.x branch contains the version of xen/next-2.6.32 which has at least passed an amount of testing (ie, boots on something at the very least). J > Allen > > -----Original Message----- > From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx] > Sent: Friday, February 11, 2011 9:07 AM > To: Kay, Allen M > Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser > Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure > > On 02/10/2011 07:07 PM, Kay, Allen M wrote: >>> That "extra memory" stuff is reserving some physical address space for >>> ballooning. It should be completely unused (and unbacked by any pages) >>> until the balloon driver populates it; it is reserved memory in the >>> meantime. >> On my system, the entire chunk is marked as usable memory: >> >> 0000000100000000 - 000000023a6f4000 (usable) >> >> When you said it is reserved memory, are you saying it should be marked as >> "reserved" or is there somewhere else in the code that keeps track of which >> portion of this e820 chunk is back by real memory and which chunk is "extra >> memory"? > Yes, it is marked as usable in the E820 so that the kernel will allocate > page structures for it. But then the extra part is reserved with > memblock_x86_reserve_range(), which should prevent the kernel from ever > trying to use that memory (ie, it will never get added to the pools of > memory the allocator allocates from). The balloon driver backs these > pseudo-physical pageframes with real memory pages, and then releases > into the pool for allocation. > > J > >> -----Original Message----- >> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx] >> Sent: Thursday, February 10, 2011 6:56 PM >> To: Kay, Allen M >> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser >> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >> >> On 02/10/2011 05:03 PM, Kay, Allen M wrote: >>> Konrad/Stefano, >>> >>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported >>> a few weeks ago. >>> >>> I finally got around to narrow down the problem the call to >>> xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup(). This call >>> increase the top of E820 memory in dom0 beyond what is actually available. >>> >>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is: >>> >>> 0000000100000000 - 000000016b45a000 (usable) >>> >>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table >>> becomes: >>> >>> 0000000100000000 - 000000023a6f4000 (usable) >>> >>> This pushes the top of RAM beyond what was reported by Xen's e820 table, >>> which is: >>> >>> (XEN) 0000000100000000 - 00000001de600000 (usable) >>> >>> AFAICT, the failure is caused by dom0 accessing non-existent physical >>> memory. The failure went away after I removed the call to >>> xen_add_extra_mem(). >> That "extra memory" stuff is reserving some physical address space for >> ballooning. It should be completely unused (and unbacked by any pages) >> until the balloon driver populates it; it is reserved memory in the >> meantime. >> >> How is that memory getting referenced in your case? >> >>> Another potential problem I noticed with e820 processing is that there is a >>> discrepancy between how Xen processes e820 and how dom0 does it. In Xen >>> (arch/x86/setup.c/start_xen()), e820 entries are aligned on >>> L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not. As a result, >>> one of my e820 entry that is 1 page in size got dropped by Xen but got >>> picked up in dom0. This does not cause problem in my case but the >>> inconsistency on how memory is used by xen and dom0 can potentially be a >>> problem. >> I don't think that matters. Xen can choose not to use non-2M aligned >> pieces of memory if it wants, but that doesn't really affect the dom0 >> kernel's use of the host E820, because dom0 is only looking for possible >> device memory, rather than RAM. >> >> J >>> Allen >>> >>> -----Original Message----- >>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] >>> Sent: Friday, January 28, 2011 7:48 AM >>> To: Kay, Allen M >>> Cc: xen-devel; Stefano Stabellini >>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure >>> >>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote: >>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote: >>>>> Following are the brief error messages from the serial console log. I >>>>> have also attached the full serial console log and dom0 system map. >>>>> >>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000 >>>> On a second look, this is a different issue than I had encountered. >>>> >>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that >>>> is not right. Googling for this shows that I had fixed this with a >>>> Xorg server at some point, but I can't remember the details so that is not >>>> that useful :-( >>>> >>>> You said it works if you give the domain 1024MB, but I wonder if >>>> it also works if you disable the IOMMU? What happens then? >>> Can you also patch your Xen hypervisor with this patch? It will print out >>> the >>> other 89 entries so we can see what type of values they have.. You might >>> need to >>> move it a bit as this is for xen-unstable. >>> >>> diff -r 003acf02d416 xen/arch/x86/mm.c >>> --- a/xen/arch/x86/mm.c Thu Jan 20 17:04:06 2011 +0000 >>> +++ b/xen/arch/x86/mm.c Fri Jan 28 10:46:13 2011 -0500 >>> @@ -1201,11 +1201,12 @@ >>> return 0; >>> >>> fail: >>> - MEM_LOG("Failure in alloc_l1_table: entry %d", i); >>> + MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other >>> L1 values:", i, pfn); >>> while ( i-- > 0 ) >>> - if ( is_guest_l1_slot(i) ) >>> + if ( is_guest_l1_slot(i) ) { >>> + MEM_LOG("L1[%d] = %lx", i, (unsigned >>> long)l1e_get_intpte(pl1e[i])); >>> put_page_from_l1e(pl1e[i], d); >>> - >>> + } >>> unmap_domain_page(pl1e); >>> return -EINVAL; >>> } >>> >>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90 >>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type >>>>> 1000000 >>>>> 000000000: caf=8000000000000003 taf=1000000000000001 >>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97 >>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 >>>>> [ec=0000 >>>>> ] >>>>> (XEN) domain_crash_sync called from entry.S >>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |