Re: [Xen-devel] pvgrub "Error 9: Unknown boot failure" booting Debian Jessie kernel (Was: Re: [PATCH v5 6/9] libxc: create unmapped initrd in domain builder if supported)

On 01/12/15 11:01, Ian Campbell wrote:
> On Tue, 2015-12-01 at 09:53 +0100, Juergen Gross wrote:
>> On 01/12/15 09:30, Ian Campbell wrote:
>>> On Tue, 2015-12-01 at 08:41 +0100, Juergen Gross wrote:
>>>>>> I'm not quite sure what to make of this, in particular I don't
>>>>>> see
>>>> anything
>>>>>> in kexec.c which obviously looks after unmapping the heap or brk
>>>> areas.
>>>>> Nah, this backtrace shows a normal allocation path while
>>>>> uncompressing the kernel image. I'd expect something like that.
>>>>> Why shouldn't mini-os make use of pfn 4d81 somewhere?
>>> That pfn ends up right in the middle of the next-kernels vaddr mapping,
>>> so at best it indicates some sort of disconnect/overlap between the
>>> mini-os memory allocator and the domain-builder memory allocator.
>> I don't think so.
>> mini-os just allocates single pages and keeps the relation to the
>> (future) pfn of that page. The p2m list is adjusted later to move the
>> allocated page to that pfn before activating the new kernel.
> Ah, I was wondering how it could possibly work so I half expected I must be
> missing something.
>>> Since it seems to be in the middle of the padding area (which might
>>> have been new in ea7c8a3d0e82, I'm not sure, it seems to be more
>>> explicit at the least) it occurred to me on the way home last night
>>> that maybe we need to unmap the padding area as well.
>> We do. The page tables need to be unmapped independently as they
>> have been mapped explicitly during setup_pgtables(dom). All the
>> mini-os mappings are removed in a loop just after that.
> "a loop" is this:
>     /* Unmap day0 pages to avoid having a r/w mapping of the future page 
> table */
>    for (pfn = 0; pfn < allocated; pfn++)
>         munmap((void*) pages[pfn], PAGE_SIZE);
> In my debugging this extends only to the end of the actual mappings, not to
> the end of the padding, e.g. for me it is extending to "Unmap pfns 0 ..
> 0x4c0f" while the unexpected PT pfn is at 0x4d80 and the padding area
> extends to pfn 0x5000.
>>> I'll try that and your suggested patch below as well once I get to the
>>> office this morning.
>> Thanks.
> The BUG_ON doesn't seem to be triggering. I'm not seeing pfn==0x4d80 going
> anywhere near kexec_allocate, the highest is 0x4c0f.
> Maybe the issue is that the ->allocate hook (==kexec_allocate) isn't called
> from xc_dom_alloc_pad?

OMG! How could I miss that? Thanks for finding this!


