[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 24/35] OvmfPkg/XenPlatformPei: Rework memory detection



On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > When running as a Xen PVH guest, there is no CMOS to read the memory
> > > size from.  Rework GetSystemMemorySize(Below|Above)4gb() so they can
> > > works without CMOS by reading the e820 table.
> > > 
> > > Rework XenPublishRamRegions for PVH, handle the Reserve type and explain
> > > about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's
> > > already done by hvmloader, on PVH it is supposed to have sane default.
> > > 
> > > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
> > > Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx>
> > > Acked-by: Laszlo Ersek <lersek@xxxxxxxxxx>
> > > ---
> > > 
> > > Notes:
> > >     Comment for Xen people:
> > >     About MTRR, should we redo the setting in OVMF? Even if in both case 
> > > of
> > >     PVH and HVM, something would have setup the default type to write back
> > >     and handle a few other ranges like PCI hole, hvmloader for HVM or and
> > >     libxc I think for PVH.
> > 
> > That's a tricky question. Ideally we would like the firmware (OVMF) to
> > take care of that, because it already has code to do so. Problem here
> > is that PVH can also be booted without firmware, in which case it
> > needs the hypervisor to have setup some sane initial MTRR state.
> > 
> > The statement in the PVH document about initial MTRR state is vague
> > enough that allows Xen to boot into the guest with a minimal MTRR
> > state, that can for example not contain UC regions for the MMIO
> > regions of passed through devices, hence I think OVMF should be in
> > charge of creating a more complete MTRR state if possible.
> > 
> > Is this something OVMF already has logic for?
> 
> Well, there are some logic but it's for QEMU (and uses an interface that
> isn't available when running on Xen, fwcfg).
> 
> The logic that was there for Xen HVM was very simple, a single set
> cache-write-back for the RAM, that's why I remove it (and because I'm
> not sure yet I figured out how to run the mtrr functions correctly in
> OVMF).
> 
> I probably going to have to write a new logic which would rewrite the
> MTRR from scratch instead of relying on the existing setup.
> 
> > Also accounting for the MMIO regions of devices?
> 
> I'll have to dig deeper into OVMF codes, and PCI device handling. On
> HVM, we have a different logic than the one for QEMU, OVMF only scan
> what hvmloader have done instead of re-setup the pci devices. I'm
> probably missing other stuff.

MTRR setup it's always a PITA, I was hoping OVMF could manage to do
that based on the memory map plus scanning for devices and positioning
BARs, but if it gets the information from other side-channels it's
going to be more complicated.

Anyway, something to improve in the future in order to get rid of
hvmloader.

> > > diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c 
> > > b/OvmfPkg/XenPlatformPei/MemDetect.c
> > > index cb7dd93ad6..3e33e7f414 100644
> > > --- a/OvmfPkg/XenPlatformPei/MemDetect.c
> > > +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
> > > @@ -96,6 +96,47 @@ Q35TsegMbytesInitialization (
> > >    mQ35TsegMbytes = ExtendedTsegMbytes;
> > >  }
> > >  
> > > +STATIC
> > > +UINT64
> > > +GetHighestSystemMemoryAddress (
> > > +  BOOLEAN       Below4gb
> > > +  )
> > > +{
> > > +  EFI_E820_ENTRY64    *E820Map;
> > > +  UINT32              E820EntriesCount;
> > > +  EFI_E820_ENTRY64    *Entry;
> > > +  EFI_STATUS          Status;
> > > +  UINT32              Loop;
> > > +  UINT64              HighestAddress;
> > > +  UINT64              EntryEnd;
> > > +
> > > +  HighestAddress = 0;
> > > +
> > > +  Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> > 
> > You could maybe initialize this as a global to avoid having to issue
> > a hypercall each time you need to get something from the memory map.
> 
> That function does that, it only make the hypercall once. (The hypercall
> can only be made once anyway, the second time Xen doesn't return the
> map.)

Why? I'm looking at the implementation in Xen of XENMEM_memory_map and
I'm not sure I see how/why the hypercall can only be made once. AFAICT
you should be able to call XENMEM_memory_map multiple times without
issues, or else it's a bug somewhere.

> > > +  ASSERT_EFI_ERROR (Status);
> > > +
> > > +  for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> > > +    Entry = E820Map + Loop;
> > > +    EntryEnd = Entry->BaseAddr + Entry->Length;
> > > +
> > > +    if (Entry->Type == EfiAcpiAddressRangeMemory &&
> > > +        EntryEnd > HighestAddress) {
> > > +
> > > +      if (Below4gb && (EntryEnd <= BASE_4GB)) {
> > > +        HighestAddress = EntryEnd;
> > > +      } else if (!Below4gb && (EntryEnd >= BASE_4GB)) {
> > > +        HighestAddress = EntryEnd;
> > > +      }
> > > +    }
> > > +  }
> > > +
> > > +  //
> > > +  // Round down the end address.
> > > +  //
> > > +  HighestAddress &= ~(UINT64)EFI_PAGE_MASK;
> > > +
> > > +  return HighestAddress;
> > 
> > You could do the rounding on the return statement.
> 
> Yes, I think that can be done.
> 
> > > +}
> > >  
> > >  UINT32
> > >  GetSystemMemorySizeBelow4gb (
> > > @@ -105,6 +146,19 @@ GetSystemMemorySizeBelow4gb (
> > >    UINT8 Cmos0x34;
> > >    UINT8 Cmos0x35;
> > >  
> > > +  //
> > > +  // In PVH case, there is no CMOS, we have to calculate the memory size
> > > +  // from parsing the E820
> > > +  //
> > > +  if (XenPvhDetected ()) {
> > 
> > IIRC on HVM you can also get the memory map from the hypercall, in
> > which case you could use the same code path for both HVM and PVH.
> 
> I think that wouldn't work because in my experiment, the hypercall would
> only return the map the first time (at least on PVH). hvmloader already
> make the hypercall so OVMF can't.

OK, I'm not sure the reason for this, as I said above I think this is
a bug somewhere. You should be able to call XENMEM_memory_map multiple
times.

> On the other hand, XenGetE820Map() return an E820 map, it doesn't matter
> if it's the one passed by hvmloader, or the one we've got directly from
> Xen. So I guess we could ignore what hvmloader have written in the CMOS

Hm, I'm not sure hvmloader uploads a new memory map to Xen (using
XENMEM_set_memory_map) if it does any modifications to it. It should
certainly do it, so that the guest OS gets the same memory map from
the hypercall or from the firmware.

> and use the information in the e820 directly. But I think I let this
> change for future patch.
> 
> > > +    UINT64  HighestAddress;
> > > +
> > > +    HighestAddress = GetHighestSystemMemoryAddress (TRUE);
> > > +    ASSERT (HighestAddress > 0 && HighestAddress <= BASE_4GB);
> > > +
> > > +    return HighestAddress;
> > 
> > The name of the function here is GetSystemMemorySizeBelow4gb, but you
> > are returning the highest memory address in the range, is this
> > expected?
> > 
> > ie: highest address != memory size
> > 
> > On HVM there are quite some holes in the memory map, and nothing
> > guarantees there are no memory regions after the holes or non-RAM
> > regions.
> 
> I think that's what is expected by caller of the function.

OK, the naming is slightly misleading, just wanted to make sure this
was the expected value.

> > > +  }
> > > +
> > >    //
> > >    // CMOS 0x34/0x35 specifies the system memory above 16 MB.
> > >    // * CMOS(0x35) is the high byte
> > > diff --git a/OvmfPkg/XenPlatformPei/Xen.c b/OvmfPkg/XenPlatformPei/Xen.c
> > > index cbfd8058fc..62a2c3ed93 100644
> > > --- a/OvmfPkg/XenPlatformPei/Xen.c
> > > +++ b/OvmfPkg/XenPlatformPei/Xen.c
> > > @@ -279,6 +279,8 @@ XenPublishRamRegions (
> > >    EFI_E820_ENTRY64  *E820Map;
> > >    UINT32            E820EntriesCount;
> > >    EFI_STATUS        Status;
> > > +  EFI_E820_ENTRY64 *Entry;
> > > +  UINTN Index;
> > >  
> > >    DEBUG ((EFI_D_INFO, "Using memory map provided by Xen\n"));
> > >  
> > > @@ -287,26 +289,45 @@ XenPublishRamRegions (
> > >    //
> > >    E820EntriesCount = 0;
> > >    Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> > > -
> > >    ASSERT_EFI_ERROR (Status);
> > >  
> > > -  if (E820EntriesCount > 0) {
> > > -    EFI_E820_ENTRY64 *Entry;
> > > -    UINT32 Loop;
> > > +  for (Index = 0; Index < E820EntriesCount; Index++) {
> > > +    UINT64 Base;
> > > +    UINT64 End;
> > >  
> > > -    for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> > > -      Entry = E820Map + Loop;
> > > +    Entry = &E820Map[Index];
> > >  
> > > +
> > > +    //
> > > +    // Round up the start address, and round down the end address.
> > > +    //
> > > +    Base = ALIGN_VALUE (Entry->BaseAddr, (UINT64)EFI_PAGE_SIZE);
> > > +    End = (Entry->BaseAddr + Entry->Length) & ~(UINT64)EFI_PAGE_MASK;
> > > +
> > > +    switch (Entry->Type) {
> > > +    case EfiAcpiAddressRangeMemory:
> > > +      AddMemoryRangeHob (Base, End);
> > > +      break;
> > > +    case EfiAcpiAddressRangeACPI:
> > > +      //
> > > +      // Ignore, OVMF should read the ACPI tables and provide them to 
> > > linux
> > > +      // from a different location.
> > 
> > Will OVMF also parse dynamic tables to check for references there?
> 
> I haven't looked at what OVMF does with the ACPI tables, but Linux seems
> fine. I've compared the boot output of linux running as PVH vs booted
> via OVMF. Beside the location of the table been different, the number of
> table where the same, I don't remember other difference.

OK, what I find weird is that you seem to discard quite a lot of stuff
from the original memory map, and then reconstruct it afterwards I
assume?

It would seem safer to not discard regions from the memory map
provided to OVMF, and instead just build on top of it. I would expect
for example that OVMF will use some of the RAM regions on the memory
map, and it should likely turn those areas from RAM into reserved
regions.

> > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > >        //
> > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > -        continue;
> > > +      if (!XenHvmloaderDetected ()) {
> > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > 
> > This special casing for PVH looks weird, ideally we would like to use
> > the same code path, or else it should be explicitly mentioned why PVH
> > has diverging behaviour.
> 
> I think hvmloader is the issue rather than PVH. Here is part of the
> "memory map" as found in hvmloader/config.h:
> 
>   /* Special BIOS mappings, etc. are allocated from here upwards... */
>   #define RESERVED_MEMBASE              0xFC000000
>   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! 
> */
>   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
>   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
>   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> 
> and hvmloader simply creates a single e820 reserved entry, from
> RESERVED_MEMBASE to the top of 4GB. It's probably too much.

But isn't this kind of dangerous? How can you assure future versions
of hvmloader won't use this space?

> If hvmloader only reserved
> ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> to special case hvmloader.

Could we look into getting this fixed in hvmloader then?

I think it's dangerous for OVMF to play such tricks with the memory
map.

> As far as I know 0xfee00000 isn't a special
> bios mapping, but something the hardware provides.

Yes, that's used by the lapic, so it's not specific to hvmloader.

> Anyway, thanks for the feedback, there's probably quite a bit to do to
> cleanup the memory stuff. I do think about one day running OVMF without
> running hvmloader first :-), but there's a bit more to do.

Sorry if some of my comments are out of place, I certainly know very
little about OVMF code base, or the Xen bits in it.

Running OVMF without hvmloader would be my end goal also if possible
:).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.