[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: E820 memory allocation issue on Threadripper platforms


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Patrick Plenefisch <simonpatp@xxxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Thu, 11 Jan 2024 11:13:29 +0100
  • Authentication-results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=URiEAx34
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 11 Jan 2024 10:13:36 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.01.24 09:37, Jan Beulich wrote:
On 11.01.2024 03:29, Patrick Plenefisch wrote:
Hi,

I ran into a memory allocation issue, I think. It is the same as
https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
was recommended (by marmarek) that the issue reporter forward the issue to
this list. I searched the list, but as I didn't see it in the list already,
I'm doing that now.

Hardware:
I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
Motherboard. I saw a 3rd issue report of a similar issue on another
Threadripper, so I think this may be Threadripper-specific.

Setup:
The QuebesOS reporter was using Qubes Installer.
My install was that I had a fresh install of Debian 12 (no gui), and then
did `apt install xen-system-amd64` and rebooted.

The issue:
Any boot of Xen on the hardware results in a halted machine. When
monitoring the logs with `vga=,keep`, we get:

(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 644kB init memory
mapping kernel into physical memory
about to get started…
xen hypervisor allocated kernel memory conflicts with E820

So first of all (the title doesn't say it) this is a Linux Dom0 issue.
Whether or not needing addressing in Xen is unknown at this point.

(XEN) Hardware Dom0 halted: halting machine

None of the settings I or the Qubes reporter have tried have been able to
get past this failure.

I am happy to provide debugging support.

Well, the crucial piece of data initially is going to be: What's the
E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
what address range is the conflict detected for? The first question
is possible to answer by supplying a serial log. The second question
likely means adding some debugging code to either Xen or Linux. The
answer to third question may be possible to infer from the other
data, but would likely be better to obtain explicitly by adjusting /
amending the message Linux emits.

The needed information should all be in the hypervisor messages.

The hypervisor is initially presenting a memory map to dom0 which is not the
same as the native memory map. Dom0 tries to rearrange its memory layout to
be compatible with the native memory map.

The seen message ("xen hypervisor allocated kernel memory conflicts with E820")
tells us that the kernel position is conflicting with the native memory map
(at least one guest pfn occupied by the kernel would be at a non-RAM populated
location after rearrangement of memory).

In theory it would be possible to cover this case, too, but it would be quite
cumbersome. Right now only the initrd is allowed to conflict with the memory map
(it will be moved in this case), kernel and initial page table conflicts are not
handled.

When I added the conflict handling nearly 10 years ago, there was no hardware
known to have memory holes at addresses which would conflict with Xen's initial
idea of dom0 memory layout.

I can look into this later, but right now I'm just about to go offline probably
until end of January.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.