[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: E820 memory allocation issue on Threadripper platforms


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Patrick Plenefisch <simonpatp@xxxxxxxxx>
  • From: Xenia Ragiadakou <xenia.ragiadakou@xxxxxxx>
  • Date: Thu, 11 Jan 2024 11:19:41 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8gbuXdVNdRi1YMwNVz1bSAXG/w/y9Z7eYpqQVXaJS1w=; b=K4BQ/b5PsHQrQjHIp4IaAaIxjDW6h5nm/eMMNh/c1fuGbFAvykPgkHwY//jAwwYUkjb5suNUQCmZIL99DxhRwzkOUnxWEcs+HAZLAurnwkEG1NRsPVaKwpUD4xpgRNffQKMaYzz4SslXciHtlzWVTL5ijJ0a+TJ2ADW5HaUgHuwVSyze78iZ16G0r21dZnl5DKWhrO3c4rutZrceM//yECFT6CyV12WNnyl7fjjLh0K3GFRl5rk7zbEtLKY/0RdGliV1Sq+XUnlQhlZHPJAv4M3yKTBmmsczDPPQVDgV729H0fRJ01wTv8mg8SOtvRz7RUmHfYY6IoqfLcsuboDM8w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ARiSL9WhnjOjUkuFQrCa+Df56Fy0NP0ldt2yHvUl++8Xy11ppTgc6sc4ch/AKHyZ4nI67lpgkBPe2rsR1FHFW9CA4HyQL4ahkhUw86tnMiAok/6aj1Zv9Z0zIdycj9X0oZYh5GQQ89Iu6B6t7RfgIbwA1aw8XkdzTwXvoC5/+9pHPLpeXLiGFsfAuNeGLUCZSYrbEXshJfcGhh7WL0x+UWqtkYnobOoMzd2GTW9iY83kJvathSvxRUnsjHTMuD6/uN6PLFcIT0ZXDRU+8Elox//WvZRH3ppkvWSdq2TWoV+f0Y6eA8u6utZHOWDCYF1bIcC/OcGrrEivo0FG+3Zn5A==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 11 Jan 2024 09:20:14 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 11/1/24 10:37, Jan Beulich wrote:
On 11.01.2024 03:29, Patrick Plenefisch wrote:
Hi,

I ran into a memory allocation issue, I think. It is the same as
https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
was recommended (by marmarek) that the issue reporter forward the issue to
this list. I searched the list, but as I didn't see it in the list already,
I'm doing that now.

Hardware:
I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
Motherboard. I saw a 3rd issue report of a similar issue on another
Threadripper, so I think this may be Threadripper-specific.

Setup:
The QuebesOS reporter was using Qubes Installer.
My install was that I had a fresh install of Debian 12 (no gui), and then
did `apt install xen-system-amd64` and rebooted.

The issue:
Any boot of Xen on the hardware results in a halted machine. When
monitoring the logs with `vga=,keep`, we get:

(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 644kB init memory
mapping kernel into physical memory
about to get started…
xen hypervisor allocated kernel memory conflicts with E820

So first of all (the title doesn't say it) this is a Linux Dom0 issue.
Whether or not needing addressing in Xen is unknown at this point.

(XEN) Hardware Dom0 halted: halting machine

None of the settings I or the Qubes reporter have tried have been able to
get past this failure.

I am happy to provide debugging support.

Well, the crucial piece of data initially is going to be: What's the
E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
what address range is the conflict detected for? The first question
is possible to answer by supplying a serial log. The second question
likely means adding some debugging code to either Xen or Linux. The
answer to third question may be possible to infer from the other
data, but would likely be better to obtain explicitly by adjusting /
amending the message Linux emits.

We 've already hit similar issue because xen doesn't take into account the reserved memory regions when loading the dom0 kernel (even if it is relocatable). It can be worked around by changing accordingly CONFIG_PHYSICAL_START in kernel config.

Let me provide more details on how to get the info Jan requested:

1) in the xen cmdline add: e820-verbose=true console_to_ring

2) in the dom0 kernel cmdline add: earlyprintk=xen

3) change the xen log message emitted by the linux kernel to print the conflicting address, like below

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index cfa99e8f054b..ad88b700d58e 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -717,7 +717,7 @@ static void __init xen_reserve_xen_mfnlist(void)
        xen_relocate_p2m();
        memblock_phys_free(start, size);
}
-
+void xen_raw_printk(const char *fmt, ...);
/**
  * xen_memory_setup - Hook for machine specific memory setup.
  **/
@@ -853,7 +853,8 @@ char * __init xen_memory_setup(void)
         */
        if (xen_is_e820_reserved(__pa_symbol(_text),
                        __pa_symbol(__bss_stop) - __pa_symbol(_text))) {
- xen_raw_console_write("Xen hypervisor allocated kernel memory conflicts with E820 map\n"); + xen_raw_printk("Xen hypervisor allocated kernel memory conflicts with E820 map: %#lx - %#lx\n", + __pa_symbol(_text), __pa_symbol(__bss_stop));
                BUG();
        }


Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.