[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] OvmfPkg/XenPlatformPei: Use CPUID to get physical address width on Xen



On 01/13/21 04:42, Igor Druzhinin wrote:
> We faced a problem with passing through a PCI device with 64GB BAR to
> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
> 64G address which pushes physical address space to 37 bits. That is above
> 36-bit width that OVMF exposes currently to a guest without tweaking
> PcdPciMmio64Size knob.
> 
> The reverse calculation using this knob was inhereted from QEMU-KVM platform
> code where it serves the purpose of finding max accessible physical
> address without necessary trusting emulated CPUID physbits value (that could
> be different from host physbits). On Xen we expect to use CPUID policy
> to level the data correctly to prevent situations with guest physbits >
> host physbits e.g. across migrations.
> 
> The next aspect raising concern - resource consumption for DXE IPL page tables
> and time required to map the whole address space in case of using CPUID
> bits directly. That could be mitigated by enabling support for 1G pages
> in DXE IPL configuration. 1G pages are available on most CPUs produced in
> the last 10 years and those without don't have many phys bits.
> 
> Remove all the redundant code now (including PcdPciMmio64.. handling that's
> not used on Xen anyway) and grab physbits directly from CPUID that should
> be what baremetal UEFI systems do.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
> ---
>  OvmfPkg/OvmfXen.dsc                |   3 +
>  OvmfPkg/XenPlatformPei/MemDetect.c | 166 
> +++----------------------------------
>  2 files changed, 15 insertions(+), 154 deletions(-)

Acked-by: Laszlo Ersek <lersek@xxxxxxxxxx>

Anthony and/or Julien should also approve this patch (and anyone from
the Xen community is welcome to comment of course, as always).

Thanks
Laszlo


> diff --git a/OvmfPkg/OvmfXen.dsc b/OvmfPkg/OvmfXen.dsc
> index 7d31e88..8ae6ed0 100644
> --- a/OvmfPkg/OvmfXen.dsc
> +++ b/OvmfPkg/OvmfXen.dsc
> @@ -444,6 +444,9 @@
>    ## Xen vlapic's frequence is 100 MHz
>    gEfiMdePkgTokenSpaceGuid.PcdFSBClock|100000000
>  
> +  # We populate DXE IPL tables with 1G pages preferably on Xen
> +  gEfiMdeModulePkgTokenSpaceGuid.PcdUse1GPageTable|TRUE
> +
>  
> ################################################################################
>  #
>  # Pcd Dynamic Section - list of all EDK II PCD Entries defined by this 
> Platform
> diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c 
> b/OvmfPkg/XenPlatformPei/MemDetect.c
> index 1f81eee..1970b63 100644
> --- a/OvmfPkg/XenPlatformPei/MemDetect.c
> +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
> @@ -172,175 +172,33 @@ GetSystemMemorySizeBelow4gb (
>    return (UINT32) (((UINTN)((Cmos0x35 << 8) + Cmos0x34) << 16) + SIZE_16MB);
>  }
>  
> -
> -STATIC
> -UINT64
> -GetSystemMemorySizeAbove4gb (
> -  )
> -{
> -  UINT32 Size;
> -  UINTN  CmosIndex;
> -
> -  //
> -  // In PVH case, there is no CMOS, we have to calculate the memory size
> -  // from parsing the E820
> -  //
> -  if (XenPvhDetected ()) {
> -    UINT64  HighestAddress;
> -
> -    HighestAddress = GetHighestSystemMemoryAddress (FALSE);
> -    ASSERT (HighestAddress == 0 || HighestAddress >= BASE_4GB);
> -
> -    if (HighestAddress >= BASE_4GB) {
> -      HighestAddress -= BASE_4GB;
> -    }
> -
> -    return HighestAddress;
> -  }
> -
> -  //
> -  // CMOS 0x5b-0x5d specifies the system memory above 4GB MB.
> -  // * CMOS(0x5d) is the most significant size byte
> -  // * CMOS(0x5c) is the middle size byte
> -  // * CMOS(0x5b) is the least significant size byte
> -  // * The size is specified in 64kb chunks
> -  //
> -
> -  Size = 0;
> -  for (CmosIndex = 0x5d; CmosIndex >= 0x5b; CmosIndex--) {
> -    Size = (UINT32) (Size << 8) + (UINT32) CmosRead8 (CmosIndex);
> -  }
> -
> -  return LShiftU64 (Size, 16);
> -}
> -
> -
> -/**
> -  Return the highest address that DXE could possibly use, plus one.
> -**/
> -STATIC
> -UINT64
> -GetFirstNonAddress (
> -  VOID
> -  )
> -{
> -  UINT64               FirstNonAddress;
> -  UINT64               Pci64Base, Pci64Size;
> -  RETURN_STATUS        PcdStatus;
> -
> -  FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
> -
> -  //
> -  // If DXE is 32-bit, then we're done; PciBusDxe will degrade 64-bit MMIO
> -  // resources to 32-bit anyway. See DegradeResource() in
> -  // "PciResourceSupport.c".
> -  //
> -#ifdef MDE_CPU_IA32
> -  if (!FeaturePcdGet (PcdDxeIplSwitchToLongMode)) {
> -    return FirstNonAddress;
> -  }
> -#endif
> -
> -  //
> -  // Otherwise, in order to calculate the highest address plus one, we must
> -  // consider the 64-bit PCI host aperture too. Fetch the default size.
> -  //
> -  Pci64Size = PcdGet64 (PcdPciMmio64Size);
> -
> -  if (Pci64Size == 0) {
> -    if (mBootMode != BOOT_ON_S3_RESUME) {
> -      DEBUG ((DEBUG_INFO, "%a: disabling 64-bit PCI host aperture\n",
> -        __FUNCTION__));
> -      PcdStatus = PcdSet64S (PcdPciMmio64Size, 0);
> -      ASSERT_RETURN_ERROR (PcdStatus);
> -    }
> -
> -    //
> -    // There's nothing more to do; the amount of memory above 4GB fully
> -    // determines the highest address plus one. The memory hotplug area (see
> -    // below) plays no role for the firmware in this case.
> -    //
> -    return FirstNonAddress;
> -  }
> -
> -  //
> -  // SeaBIOS aligns both boundaries of the 64-bit PCI host aperture to 1GB, 
> so
> -  // that the host can map it with 1GB hugepages. Follow suit.
> -  //
> -  Pci64Base = ALIGN_VALUE (FirstNonAddress, (UINT64)SIZE_1GB);
> -  Pci64Size = ALIGN_VALUE (Pci64Size, (UINT64)SIZE_1GB);
> -
> -  //
> -  // The 64-bit PCI host aperture should also be "naturally" aligned. The
> -  // alignment is determined by rounding the size of the aperture down to the
> -  // next smaller or equal power of two. That is, align the aperture by the
> -  // largest BAR size that can fit into it.
> -  //
> -  Pci64Base = ALIGN_VALUE (Pci64Base, GetPowerOfTwo64 (Pci64Size));
> -
> -  if (mBootMode != BOOT_ON_S3_RESUME) {
> -    //
> -    // The core PciHostBridgeDxe driver will automatically add this range to
> -    // the GCD memory space map through our PciHostBridgeLib instance; here 
> we
> -    // only need to set the PCDs.
> -    //
> -    PcdStatus = PcdSet64S (PcdPciMmio64Base, Pci64Base);
> -    ASSERT_RETURN_ERROR (PcdStatus);
> -    PcdStatus = PcdSet64S (PcdPciMmio64Size, Pci64Size);
> -    ASSERT_RETURN_ERROR (PcdStatus);
> -
> -    DEBUG ((DEBUG_INFO, "%a: Pci64Base=0x%Lx Pci64Size=0x%Lx\n",
> -      __FUNCTION__, Pci64Base, Pci64Size));
> -  }
> -
> -  //
> -  // The useful address space ends with the 64-bit PCI host aperture.
> -  //
> -  FirstNonAddress = Pci64Base + Pci64Size;
> -  return FirstNonAddress;
> -}
> -
> -
>  /**
> -  Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
> +  Initialize the mPhysMemAddressWidth variable, based on CPUID data.
>  **/
>  VOID
>  AddressWidthInitialization (
>    VOID
>    )
>  {
> -  UINT64 FirstNonAddress;
> +  UINT32 RegEax;
>  
> -  //
> -  // As guest-physical memory size grows, the permanent PEI RAM requirements
> -  // are dominated by the identity-mapping page tables built by the DXE IPL.
> -  // The DXL IPL keys off of the physical address bits advertized in the CPU
> -  // HOB. To conserve memory, we calculate the minimum address width here.
> -  //
> -  FirstNonAddress      = GetFirstNonAddress ();
> -  mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
> -
> -  //
> -  // If FirstNonAddress is not an integral power of two, then we need an
> -  // additional bit.
> -  //
> -  if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
> -    ++mPhysMemAddressWidth;
> +  AsmCpuid (0x80000000, &RegEax, NULL, NULL, NULL);
> +  if (RegEax >= 0x80000008) {
> +    AsmCpuid (0x80000008, &RegEax, NULL, NULL, NULL);
> +    mPhysMemAddressWidth = (UINT8) RegEax;
> +  } else {
> +    mPhysMemAddressWidth = 36;
>    }
>  
>    //
> -  // The minimum address width is 36 (covers up to and excluding 64 GB, which
> -  // is the maximum for Ia32 + PAE). The theoretical architecture maximum for
> -  // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. 
> We
> -  // can simply assert that here, since 48 bits are good enough for 256 TB.
> +  // IA-32e paging translates 48-bit linear addresses to 52-bit physical 
> addresses.
>    //
> -  if (mPhysMemAddressWidth <= 36) {
> -    mPhysMemAddressWidth = 36;
> +  ASSERT (mPhysMemAddressWidth <= 52);
> +  if (mPhysMemAddressWidth > 48) {
> +    mPhysMemAddressWidth = 48;
>    }
> -  ASSERT (mPhysMemAddressWidth <= 48);
>  }
>  
> -
>  /**
>    Calculate the cap for the permanent PEI memory.
>  **/
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.