[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: E820 memory allocation issue on Threadripper platforms





On Tue, Jan 16, 2024 at 4:33 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
On 16.01.2024 01:22, Patrick Plenefisch wrote:
> I managed to set up serial access and saved the output with the requested
> flags as the attached logs

Thanks. While you didn't ...


... fiddle with the Linux message,  ...

I last built the kernel over a decade ago, and so was hoping to not have to look up how to do that again, but I can research how to go about that again if it would help?
 

... as per

(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000

there's an overlap with not exactly a hole, but with an
EfiACPIMemoryNVS region:

(XEN)  0000000100000-0000003159fff type=2 attr=000000000000000f
(XEN)  000000315a000-0000003ffffff type=7 attr=000000000000000f
(XEN)  0000004000000-0000004045fff type=10 attr=000000000000000f
(XEN)  0000004046000-0000009afefff type=7 attr=000000000000000f

(the 3rd of the 4 lines). Considering there's another region higher
up:

(XEN)  00000a747f000-00000a947efff type=10 attr=000000000000000f

I'm inclined to say it is poor firmware (or, far less likely, boot
loader) behavior to clobber a rather low and entirely arbitrary RAM
 
Bootloader is Grub 2.06 EFI platform as packaged by Debian 12

 
range, rather than consolidating all such regions near the top of
RAM below 4Gb. There are further such odd regions, btw:

(XEN)  0000009aff000-0000009ffffff type=0 attr=000000000000000f
...
(XEN)  000000b000000-000000b020fff type=0 attr=000000000000000f

If the kernel image was sufficiently much larger, these could become
a problem as well. Otoh if the kernel wasn't built with
CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
2Mb, things should apparently work even with this unusual memory
layout (until the kernel would grow enough to again run into that
very region).

I'm currently talking to the vendor's support team and testing a beta BIOS for unrelated reasons, is there something specific I should forward to them, either as a question or as a request for a fix?

As someone who hasn't built a kernel in over a decade, should I figure out how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report back?


It remains to be seen in how far it is reasonably possible to work
around this in the kernel. While (sadly) still unsupported, in the
meantime you may want to consider running Dom0 in PVH mode.

I tried this by adding dom0=pvh, and instead got this boot error:

(XEN) xenoprof: Initialization failed. AMD processor family 25 is not supported
(XEN) NX (Execute Disable) protection active
(XEN) Dom0 has maximum 1400 PIRQs
(XEN) *** Building a PVH Dom0 ***
(XEN) Failed to load kernel: -1
(XEN) Xen dom0 kernel broken ELF: <NULL>
(XEN) Failed to load Dom0 kernel
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


 

Jan

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.