Xen project Mailing List

Re: [Xen-devel] [PATCH v3 24/35] OvmfPkg/XenPlatformPei: Rework memory detection

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 23 Jul 2019 11:42:07 +0200

Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=roger.pau@xxxxxxxxxx; spf=Pass smtp.mailfrom=roger.pau@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>, Jordan Justen <jordan.l.justen@xxxxxxxxx>, devel@xxxxxxxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Laszlo Ersek <lersek@xxxxxxxxxx>

Delivery-date: Tue, 23 Jul 2019 09:42:38 +0000

Ironport-sdr: 5dg13tki83B8Pr7WiYuAuN3r+bjKSwPvP/5XDMQ0Ul3zvrTv1uXz6gXU4TNIy0CkCLpq424KXW p009OslFqL1j/N6THs+EfEx5+hqQjaDuyXhBpQhjZutzhT81gA14bOR8l4n1G4GPd1c35oBSON v/mMgxz8AQa6n1Ix4OO/Tbj3KMdMuJPnhaRdjQoaSGZcAv1dM7qZPJtuBxg4pewSWFh0IlyR79 Qk6aGcNSkDjQK6mTD7gNgAxdKTgrFNW5Yg0PBRXXl7fY+jHV/KGkq02jLysrBiiwKDNDZbEdoJ KVg=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote: > On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote: > > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote: > > > When running as a Xen PVH guest, there is no CMOS to read the memory > > > size from. Rework GetSystemMemorySize(Below|Above)4gb() so they can > > > works without CMOS by reading the e820 table. > > > > > > Rework XenPublishRamRegions for PVH, handle the Reserve type and explain > > > about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's > > > already done by hvmloader, on PVH it is supposed to have sane default. > > > > > > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689 > > > Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx> > > > Acked-by: Laszlo Ersek <lersek@xxxxxxxxxx> > > > --- > > > > > > Notes: > > > Comment for Xen people: > > > About MTRR, should we redo the setting in OVMF? Even if in both case > > > of > > > PVH and HVM, something would have setup the default type to write back > > > and handle a few other ranges like PCI hole, hvmloader for HVM or and > > > libxc I think for PVH. > > > > That's a tricky question. Ideally we would like the firmware (OVMF) to > > take care of that, because it already has code to do so. Problem here > > is that PVH can also be booted without firmware, in which case it > > needs the hypervisor to have setup some sane initial MTRR state. > > > > The statement in the PVH document about initial MTRR state is vague > > enough that allows Xen to boot into the guest with a minimal MTRR > > state, that can for example not contain UC regions for the MMIO > > regions of passed through devices, hence I think OVMF should be in > > charge of creating a more complete MTRR state if possible. > > > > Is this something OVMF already has logic for? > > Well, there are some logic but it's for QEMU (and uses an interface that > isn't available when running on Xen, fwcfg). > > The logic that was there for Xen HVM was very simple, a single set > cache-write-back for the RAM, that's why I remove it (and because I'm > not sure yet I figured out how to run the mtrr functions correctly in > OVMF). > > I probably going to have to write a new logic which would rewrite the > MTRR from scratch instead of relying on the existing setup. > > > Also accounting for the MMIO regions of devices? > > I'll have to dig deeper into OVMF codes, and PCI device handling. On > HVM, we have a different logic than the one for QEMU, OVMF only scan > what hvmloader have done instead of re-setup the pci devices. I'm > probably missing other stuff. MTRR setup it's always a PITA, I was hoping OVMF could manage to do that based on the memory map plus scanning for devices and positioning BARs, but if it gets the information from other side-channels it's going to be more complicated. Anyway, something to improve in the future in order to get rid of hvmloader. > > > diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c > > > b/OvmfPkg/XenPlatformPei/MemDetect.c > > > index cb7dd93ad6..3e33e7f414 100644 > > > --- a/OvmfPkg/XenPlatformPei/MemDetect.c > > > +++ b/OvmfPkg/XenPlatformPei/MemDetect.c > > > @@ -96,6 +96,47 @@ Q35TsegMbytesInitialization ( > > > mQ35TsegMbytes = ExtendedTsegMbytes; > > > } > > > > > > +STATIC > > > +UINT64 > > > +GetHighestSystemMemoryAddress ( > > > + BOOLEAN Below4gb > > > + ) > > > +{ > > > + EFI_E820_ENTRY64 *E820Map; > > > + UINT32 E820EntriesCount; > > > + EFI_E820_ENTRY64 *Entry; > > > + EFI_STATUS Status; > > > + UINT32 Loop; > > > + UINT64 HighestAddress; > > > + UINT64 EntryEnd; > > > + > > > + HighestAddress = 0; > > > + > > > + Status = XenGetE820Map (&E820Map, &E820EntriesCount); > > > > You could maybe initialize this as a global to avoid having to issue > > a hypercall each time you need to get something from the memory map. > > That function does that, it only make the hypercall once. (The hypercall > can only be made once anyway, the second time Xen doesn't return the > map.) Why? I'm looking at the implementation in Xen of XENMEM_memory_map and I'm not sure I see how/why the hypercall can only be made once. AFAICT you should be able to call XENMEM_memory_map multiple times without issues, or else it's a bug somewhere. > > > + ASSERT_EFI_ERROR (Status); > > > + > > > + for (Loop = 0; Loop < E820EntriesCount; Loop++) { > > > + Entry = E820Map + Loop; > > > + EntryEnd = Entry->BaseAddr + Entry->Length; > > > + > > > + if (Entry->Type == EfiAcpiAddressRangeMemory && > > > + EntryEnd > HighestAddress) { > > > + > > > + if (Below4gb && (EntryEnd <= BASE_4GB)) { > > > + HighestAddress = EntryEnd; > > > + } else if (!Below4gb && (EntryEnd >= BASE_4GB)) { > > > + HighestAddress = EntryEnd; > > > + } > > > + } > > > + } > > > + > > > + // > > > + // Round down the end address. > > > + // > > > + HighestAddress &= ~(UINT64)EFI_PAGE_MASK; > > > + > > > + return HighestAddress; > > > > You could do the rounding on the return statement. > > Yes, I think that can be done. > > > > +} > > > > > > UINT32 > > > GetSystemMemorySizeBelow4gb ( > > > @@ -105,6 +146,19 @@ GetSystemMemorySizeBelow4gb ( > > > UINT8 Cmos0x34; > > > UINT8 Cmos0x35; > > > > > > + // > > > + // In PVH case, there is no CMOS, we have to calculate the memory size > > > + // from parsing the E820 > > > + // > > > + if (XenPvhDetected ()) { > > > > IIRC on HVM you can also get the memory map from the hypercall, in > > which case you could use the same code path for both HVM and PVH. > > I think that wouldn't work because in my experiment, the hypercall would > only return the map the first time (at least on PVH). hvmloader already > make the hypercall so OVMF can't. OK, I'm not sure the reason for this, as I said above I think this is a bug somewhere. You should be able to call XENMEM_memory_map multiple times. > On the other hand, XenGetE820Map() return an E820 map, it doesn't matter > if it's the one passed by hvmloader, or the one we've got directly from > Xen. So I guess we could ignore what hvmloader have written in the CMOS Hm, I'm not sure hvmloader uploads a new memory map to Xen (using XENMEM_set_memory_map) if it does any modifications to it. It should certainly do it, so that the guest OS gets the same memory map from the hypercall or from the firmware. > and use the information in the e820 directly. But I think I let this > change for future patch. > > > > + UINT64 HighestAddress; > > > + > > > + HighestAddress = GetHighestSystemMemoryAddress (TRUE); > > > + ASSERT (HighestAddress > 0 && HighestAddress <= BASE_4GB); > > > + > > > + return HighestAddress; > > > > The name of the function here is GetSystemMemorySizeBelow4gb, but you > > are returning the highest memory address in the range, is this > > expected? > > > > ie: highest address != memory size > > > > On HVM there are quite some holes in the memory map, and nothing > > guarantees there are no memory regions after the holes or non-RAM > > regions. > > I think that's what is expected by caller of the function. OK, the naming is slightly misleading, just wanted to make sure this was the expected value. > > > + } > > > + > > > // > > > // CMOS 0x34/0x35 specifies the system memory above 16 MB. > > > // * CMOS(0x35) is the high byte > > > diff --git a/OvmfPkg/XenPlatformPei/Xen.c b/OvmfPkg/XenPlatformPei/Xen.c > > > index cbfd8058fc..62a2c3ed93 100644 > > > --- a/OvmfPkg/XenPlatformPei/Xen.c > > > +++ b/OvmfPkg/XenPlatformPei/Xen.c > > > @@ -279,6 +279,8 @@ XenPublishRamRegions ( > > > EFI_E820_ENTRY64 *E820Map; > > > UINT32 E820EntriesCount; > > > EFI_STATUS Status; > > > + EFI_E820_ENTRY64 *Entry; > > > + UINTN Index; > > > > > > DEBUG ((EFI_D_INFO, "Using memory map provided by Xen\n")); > > > > > > @@ -287,26 +289,45 @@ XenPublishRamRegions ( > > > // > > > E820EntriesCount = 0; > > > Status = XenGetE820Map (&E820Map, &E820EntriesCount); > > > - > > > ASSERT_EFI_ERROR (Status); > > > > > > - if (E820EntriesCount > 0) { > > > - EFI_E820_ENTRY64 *Entry; > > > - UINT32 Loop; > > > + for (Index = 0; Index < E820EntriesCount; Index++) { > > > + UINT64 Base; > > > + UINT64 End; > > > > > > - for (Loop = 0; Loop < E820EntriesCount; Loop++) { > > > - Entry = E820Map + Loop; > > > + Entry = &E820Map[Index]; > > > > > > + > > > + // > > > + // Round up the start address, and round down the end address. > > > + // > > > + Base = ALIGN_VALUE (Entry->BaseAddr, (UINT64)EFI_PAGE_SIZE); > > > + End = (Entry->BaseAddr + Entry->Length) & ~(UINT64)EFI_PAGE_MASK; > > > + > > > + switch (Entry->Type) { > > > + case EfiAcpiAddressRangeMemory: > > > + AddMemoryRangeHob (Base, End); > > > + break; > > > + case EfiAcpiAddressRangeACPI: > > > + // > > > + // Ignore, OVMF should read the ACPI tables and provide them to > > > linux > > > + // from a different location. > > > > Will OVMF also parse dynamic tables to check for references there? > > I haven't looked at what OVMF does with the ACPI tables, but Linux seems > fine. I've compared the boot output of linux running as PVH vs booted > via OVMF. Beside the location of the table been different, the number of > table where the same, I don't remember other difference. OK, what I find weird is that you seem to discard quite a lot of stuff from the original memory map, and then reconstruct it afterwards I assume? It would seem safer to not discard regions from the memory map provided to OVMF, and instead just build on top of it. I would expect for example that OVMF will use some of the RAM regions on the memory map, and it should likely turn those areas from RAM into reserved regions. > > > + // error message: CpuDxe: IntersectMemoryDescriptor: > > > + // desc [FC000000, 100000000) type 1 cap 8700000000026001 > > > + // conflicts with aperture [FEE00000, FEE01000) cap 1 > > > // > > > - if (Entry->Type != EfiAcpiAddressRangeMemory) { > > > - continue; > > > + if (!XenHvmloaderDetected ()) { > > > + AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE); > > > > This special casing for PVH looks weird, ideally we would like to use > > the same code path, or else it should be explicitly mentioned why PVH > > has diverging behaviour. > > I think hvmloader is the issue rather than PVH. Here is part of the > "memory map" as found in hvmloader/config.h: > > /* Special BIOS mappings, etc. are allocated from here upwards... */ > #define RESERVED_MEMBASE 0xFC000000 > /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! > */ > #define ACPI_INFO_PHYSICAL_ADDRESS 0xFC000000 > #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000 > #define RESERVED_MEMORY_DYNAMIC_END 0xFE000000 > > and hvmloader simply creates a single e820 reserved entry, from > RESERVED_MEMBASE to the top of 4GB. It's probably too much. But isn't this kind of dangerous? How can you assure future versions of hvmloader won't use this space? > If hvmloader only reserved > ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have > to special case hvmloader. Could we look into getting this fixed in hvmloader then? I think it's dangerous for OVMF to play such tricks with the memory map. > As far as I know 0xfee00000 isn't a special > bios mapping, but something the hardware provides. Yes, that's used by the lapic, so it's not specific to hvmloader. > Anyway, thanks for the feedback, there's probably quite a bit to do to > cleanup the memory stuff. I do think about one day running OVMF without > running hvmloader first :-), but there's a bit more to do. Sorry if some of my comments are out of place, I certainly know very little about OVMF code base, or the Xen bits in it. Running OVMF without hvmloader would be my end goal also if possible :). Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.