[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: E820 memory allocation issue on Threadripper platforms





On Wed, Jan 17, 2024 at 3:46 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
On 17.01.2024 07:12, Patrick Plenefisch wrote:
> On Tue, Jan 16, 2024 at 4:33 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
>> On 16.01.2024 01:22, Patrick Plenefisch wrote:
>>> I managed to set up serial access and saved the output with the requested
>>> flags as the attached logs
>>
>> Thanks. While you didn't ...
>>
>>
>> ... fiddle with the Linux message,  ...
>>
>
> I last built the kernel over a decade ago, and so was hoping to not have to
> look up how to do that again, but I can research how to go about that again
> if it would help?
>

The nice thing about threadripper is the fast kernel build times. I have added that patch to the kernel and confirmed:

about to get started...
Xen hypervisor allocated kernel memory conflicts with E820 map: 0x1000000 - 0x4400000
(XEN) Hardware Dom0 halted: halting machine


 
>
> I'm currently talking to the vendor's support team and testing a beta BIOS
> for unrelated reasons, is there something specific I should forward to
> them, either as a question or as a request for a fix?

Well, first it would need figuring whether the "interesting" regions
are being put in place by firmware of the boot loader. If it's firmware
(pretty likely at least for the region you're having trouble with), you
may want to ask them to re-do where they place that specific data.

This section changes boot-to-boot and grub vs EFI direct load, but my untrained eyes don't see an obvoius pattern. I've attached several logs. Name format:

xen-XENVERSION_LOADER_KERNELNAME_TYPE.log

where XENVERSION is 4.17 (packaged in debian 12) or 4.18 (I built from source) or 4.18p (I applied the patch you mention below and built from source)

where LOADER is grub for grub2 (from debian 12) or UEFI (direct boot via efibootmgr-configured UEFI entry)

where KERNELNAME is either empty (PVH failure), or linuxpatch (linux with the patch requested above), or linuxoffset (with PHYSICAL_START=2MiB), or linux6 (debian 12 kernel)

where TYPE is either pvh or pv

For the two logs that actually boooted (linuxoffset), I truncated them during pcie initialization, but they did go all the way to give me a login screen

 

> As someone who hasn't built a kernel in over a decade, should I figure out
> how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> back?

That was largely a suggestion to perhaps allow you to gain some
workable setup. It would be of interest to us largely for completeness.

Typo aside, setting the boot to 2MiB works! It works better for PV, while PVH has some graphics card issues, namely that I have to interact over serial and dmesg has some concerning radeon errors
 


Hmm, that's sad. The more that the error messages aren't really
informative. You did check though that your kernel is PVH-capable?
(With a debug build of Xen, and with suitably high logging level,
various of the ELF properties would be logged. Such output may or
may not give further hints towards what's actually wrong.

 
Albeit
you using 4.17 this would further require you to pull in commit
ea3dabfb80d7 ["x86/PVH: allow Dom0 ELF parsing to be verbose"].)

This was applied in "4.18p" logs (above)
 

But wait - aren't you running into the same collision there with
that memory region? I think that explains the unhelpful output.
Whereas I assume the native kernel can deal with that as long as
it's built with CONFIG_RELOCATABLE=y. I don't think we want to
get into the business of interpreting the kernel's internal
representation of the relocations needed, so it's not really
clear to me what we might do in such a case. Perhaps the only way
is to signal to the kernel that it needs to apply relocations
itself (which in turn would require the kernel to signal to us
that it's capable of doing so). Cc-ing Roger in case he has any
neat idea.

Yes, PVH, PV, and Relocatable are all enabled in the debian kernel I was using, and then basing my kernel config on.

Said kernel, with its config file can be found at https://packages.debian.org/bookworm/linux-image-6.1.0-17-amd64
 

Jan

Attachment: xen-4.18p_grub_linux6_pvh.log
Description: Text Data

Attachment: xen-4.18p_grub_linuxoffset_pvh.log
Description: Text Data

Attachment: xen-4.18_UEFI_pvh.log
Description: Text Data

Attachment: xen-4.17_UEFI_linux6_pv.log
Description: Text Data

Attachment: xen-4.17_grub_linuxoffset_pv.log
Description: Text Data

Attachment: xen-4.17_UEFI_pvh.log
Description: Text Data

Attachment: xen-4.18p_grub_linuxpatch_pv.log
Description: Text Data

Attachment: xen-4.17_grub_linux6_pv.log
Description: Text Data


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.