[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure



On 02/11/2011 11:00 AM, Kay, Allen M wrote:
> The code for memblock_x86_reserve_range() does not exist in 2.6.32.27 pvops 
> dom0.

No, the function changed name, but the concept is the same..

>   I did find it in Konrad's tree at 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git.
>
> So is this a problem for 2.6.32.27 stable tree?  If so, which pvops dom0 tree 
> should I be using?

I *just* pushed .32.27 and haven't had a chance to test it.  The
xen/stable-2.6.32.x branch contains the version of xen/next-2.6.32 which
has at least passed an amount of testing (ie, boots on something at the
very least).

    J

> Allen
>
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx] 
> Sent: Friday, February 11, 2011 9:07 AM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 07:07 PM, Kay, Allen M wrote:
>>> That "extra memory" stuff is reserving some physical address space for
>>> ballooning.  It should be completely unused (and unbacked by any pages)
>>> until the balloon driver populates it; it is reserved memory in the
>>> meantime.
>> On my system, the entire chunk is marked as usable memory:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> When you said it is reserved memory, are you saying it should be marked as 
>> "reserved" or is there somewhere else in the code that keeps track of which 
>> portion of this e820 chunk is back by real memory and which chunk is "extra 
>> memory"?
> Yes, it is marked as usable in the E820 so that the kernel will allocate
> page structures for it.  But then the extra part is reserved with
> memblock_x86_reserve_range(), which should prevent the kernel from ever
> trying to use that memory (ie, it will never get added to the pools of
> memory the allocator allocates from).  The balloon driver backs these
> pseudo-physical pageframes with real memory pages, and then releases
> into the pool for allocation.
>
>     J
>
>> -----Original Message-----
>> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx] 
>> Sent: Thursday, February 10, 2011 6:56 PM
>> To: Kay, Allen M
>> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>>
>> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>>> Konrad/Stefano,
>>>
>>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported 
>>> a few weeks ago.
>>>
>>> I finally got around to narrow down the problem the call to 
>>> xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call 
>>> increase the top of E820 memory in dom0 beyond what is actually available.
>>>
>>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is:
>>>
>>>     0000000100000000 - 000000016b45a000 (usable)
>>>
>>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table 
>>> becomes:
>>>
>>>     0000000100000000 - 000000023a6f4000 (usable)
>>>
>>> This pushes the top of RAM beyond what was reported by Xen's e820 table, 
>>> which is:
>>>
>>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>>
>>> AFAICT, the failure is caused by dom0 accessing non-existent physical 
>>> memory.  The failure went away after I removed the call to 
>>> xen_add_extra_mem().
>> That "extra memory" stuff is reserving some physical address space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
>>
>> How is that memory getting referenced in your case?
>>
>>> Another potential problem I noticed with e820 processing is that there is a 
>>> discrepancy between how Xen processes e820 and how dom0 does it.  In Xen 
>>> (arch/x86/setup.c/start_xen()), e820 entries are aligned on 
>>> L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not.  As a result, 
>>> one of my e820 entry that is 1 page in size got dropped by Xen but got 
>>> picked up in dom0.  This does not cause problem in my case but the 
>>> inconsistency on how memory is used by xen and dom0 can potentially be a 
>>> problem.
>> I don't think that matters.  Xen can choose not to use non-2M aligned
>> pieces of memory if it wants, but that doesn't really affect the dom0
>> kernel's use of the host E820, because dom0 is only looking for possible
>> device memory, rather than RAM.
>>
>>     J
>>> Allen
>>>
>>> -----Original Message-----
>>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] 
>>> Sent: Friday, January 28, 2011 7:48 AM
>>> To: Kay, Allen M
>>> Cc: xen-devel; Stefano Stabellini
>>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>>>
>>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>>> Following are the brief error messages from the serial console log.  I 
>>>>> have also attached the full serial console log and dom0 system map.
>>>>>
>>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>>> On a second look, this is a different issue than I had encountered.
>>>>
>>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
>>>> is not right. Googling for this shows that I had fixed this with a
>>>> Xorg server at some point, but I can't remember the details so that is not
>>>> that useful :-(
>>>>
>>>> You said it works if you give the domain 1024MB, but I wonder if
>>>> it also works if you disable the IOMMU? What happens then?
>>> Can you also patch your Xen hypervisor with this patch? It will print out 
>>> the
>>> other 89 entries so we can see what type of values they have.. You might 
>>> need to
>>> move it a bit as this is for xen-unstable.
>>>
>>> diff -r 003acf02d416 xen/arch/x86/mm.c
>>> --- a/xen/arch/x86/mm.c     Thu Jan 20 17:04:06 2011 +0000
>>> +++ b/xen/arch/x86/mm.c     Fri Jan 28 10:46:13 2011 -0500
>>> @@ -1201,11 +1201,12 @@
>>>      return 0;
>>>  
>>>   fail:
>>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other 
>>> L1 values:", i, pfn);
>>>      while ( i-- > 0 )
>>> -        if ( is_guest_l1_slot(i) )
>>> +        if ( is_guest_l1_slot(i) ) {
>>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned 
>>> long)l1e_get_intpte(pl1e[i]));
>>>              put_page_from_l1e(pl1e[i], d);
>>> -
>>> +   }
>>>      unmap_domain_page(pl1e);
>>>      return -EINVAL;
>>>  }
>>>
>>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 
>>>>> 1000000
>>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 
>>>>> [ec=0000
>>>>> ]
>>>>> (XEN) domain_crash_sync called from entry.S
>>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.