[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pci: Correct ECS handling with CF8/CFC emulation


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 3 Apr 2023 15:26:42 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iuyfu2/BsHus3kHZGId7MNczCmHstVKYik7fQ1vMBSI=; b=mXe2PDdASLPCtoeViNadZDC16EsdXq7ZDpcQ+uMHM2G66RP/wLaxWwsAA9DLVBetQIFq8T2BSUSdtUOP5v/fbKMbEC7Y7Zg6ZzBkTlSs74TSOsDYTbnUwW55WOCf8inEOT49Cyb1R7TwKXKv7uNKNzpve7DwBsVVbd7+Dwp24KtP6aGETQ2l16q6pMBATlHkGdyDDmC/runZNmPQek2N1L506zrcyBUYACPgCngpY1T8aL9VT5HDKGyHzK0n/qGAcok2RferIDq030L1hhtO7POwwga0ax8RWeqrJdOcDevM5VcX7FGCs7i/sBVRl8YgD8Vd8zOyw4KHfzto9AHP0A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mB1mmXF+aylgBGjOWaPSQlAFMio6He+ps3MaWmk1+PqKcp02EeGbaWQoLXrovAtw41RC0MDy1cB0Yf5hCJWlkTLYMYeSntea2+xhKXqeVVASI1ohy3noBUUwggMQS/lpvP/ftCO0XHl4/OCDav7Vc7YNqhbn8E45YCBE7SK//Yr/zhGc0EzRoGG3aopIn8zf6UtchctwFt8Bev4RIeCGREXbRG+pOhyf4fH3Cw7Z2NJMkTPiuaMeZpcYmUbHTIu7dxKsiqVjOsWhRF434E6QfcqXL9zzR0ejPX3kkSkWIgK6fhazsesOeJmDtKwcOfYTnpyXQIAFqRegIMdAJOxeUw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 03 Apr 2023 13:27:11 +0000
  • Ironport-data: A9a23:lVpx7KibkeP8VPf9HdoBkkubX161RxEKZh0ujC45NGQN5FlHY01je htvUGnUbPnca2Dwe4olOYy0p0MBu8fTy9A3QFM++ysyRn8b9cadCdqndUqhZCn6wu8v7q5Ex 55HNoSfdpBcolv0/ErF3m3J9CEkvU2wbuOgTrWCYmYpHlUMpB4J0XpLg/Q+jpNjne+3CgaMv cKai8DEMRqu1iUc3lg8sspvkzsy+qWi0N8klgZmP6sT4AeFzyB94K83fsldEVOpGuG4IcbiL wrz5OnR1n/U+R4rFuSknt7TGqHdauePVeQmoiM+t5mK2nCulARrukoIHKN0hXNsoyeIh7hMJ OBl7vRcf+uL0prkw4zxWzEAe8130DYvFLXveRBTuuTLp6HKnueFL1yDwyjaMKVBktubD12i+ tQcLQoMNjCcwNipnr/jUttmmYcdN+j0adZ3VnFIlVk1DN4AaLWaGeDv2oUd2z09wMdTAfzZe swVLyJ1awjNaAFOPVFRD48imOCvhT/0dDgwRFC9/PJrpTSMilIvluSyWDbWUoXiqcF9hEGXq 3iA523kKhobKMae2XyO9XfEaurnxHulBd5NROflnhJsqFSP61VQIQUPbGuygN2mt2fhSu0OC VNBr0LCqoB3riRHVOLVTxC+5XKJoBMYc95RCPEhrhGAzLLO5ASUDXRCSSROAPQEnsIrQT0h1 neSgsjkQzdotdW9Vna15rqS6zSoNkAowXQqYCYFSU4A/IPlqYRq1BbXFI4/T+iyk8H/Hiz2z 3aSti8iir4PjMkNkaKm4VTAhDHqrZ/MJuIo2jjqsquexlsRTOaYi0aAsjA3Md4owF6lc2S8
  • Ironport-hdrordr: A9a23:MY017a18yJU3K80J7x1G2wqjBAkkLtp133Aq2lEZdPWaSK2lfq eV7ZImPH7P+VEssRQb8+xoV5PsfZqxz/JICMwqTNSftOePghrVEGgg1/qe/9XYcxeOidK1rJ 0QDZSWaueRMbEKt7ef3ODiKadY/DDvysnB7ts2jU0dLz2CDZsO0+4TMHf/LqQZfmd77LMCZe uhz/sCiTq8WGgdKv+2DmMCWIH41qf2vaOjTx4aJgItrDKDhzOw6LL8DnGjr2wjegIK77c+0H TP1zf07KW7s/2911v12mLJ445N8eGRuudrNYijitU1Nj6psAquaYh7MofyxAwInA==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Apr 03, 2023 at 11:16:52AM +0100, Andrew Cooper wrote:
> On 03/04/2023 9:57 am, Roger Pau Monné wrote:
> > On Fri, Mar 31, 2023 at 06:57:19PM +0100, Andrew Cooper wrote:
> >> This started by noticing the dubious Fam17h check in
> >> arch_ioreq_server_get_type_addr(), and then realising that checking the 
> >> host
> >> CF8_EXT setting is utterly bogus for the guest <-> qemu emulation path.
> >>
> >> What should be consulted here is the guest's MSR_AMD64_NB_CFG setting, but 
> >> a)
> >> there isn't one, and b) the vestigial remnants of the cross-vendor 
> >> migration
> >> logic cause MSR_AMD64_NB_CFG to be unconditionally read-as-zero, making the
> >> CF8_EXT path unused by any suitably-written OS in the first place.
> >>
> >> MSR_AMD64_NB_CFG really has been removed on Fam17h (it's now just a 
> >> read-zero,
> >> write-discard stub), and the ECS extension is unconditionally active, 
> >> meaning
> >> it is not correct for Xen to ignore the ExtRegNo field on newer AMD CPUs.
> >>
> >> It turns out that Xen did even have this behaviour in 4.5 and earlier, with
> >> this problematic CF8_EXT checking being added in 4.6.  Therefore, revert 
> >> back
> >> to Xen's older behaviour - it is objectively less wrong than the current
> >> logic.
> >>
> >> While fixing this, get rid of hvm_pci_decode_addr() - it is more 
> >> complicated
> >> to follow (and to call) than using the CF8* macros in the calling context.
> >> Rename CF8_ADDR() to CF8_REG() to better describe what it does, and write a
> >> comment explaining all about CF8/CFC accesses.
> >>
> >> There's one rare case when CF8_EXT is visible to guests, and that is for a
> >> pinned hwdom.  Right now, we permit such a dom0 to modify the CF8_EXT bit, 
> >> but
> >> this seems like a very unwise idea.  Leave a TODO for people to consider.
> > One weirdness I've noticed is that for vPCI we decode the accesses
> > taking the extended CF8 bit after this change, but then if the access
> > is relayed to the hardware using vpci_{read,write}_hw it will be
> > forwarded to the hardware using pci_conf_{read,write}<size> which
> > doesn't have support for CF8_EXT.  So if the underlying hardware
> > doesn't have MMCFG support and the reg is > 255 it will be dropped.
> 
> It is important to stress that this change does not influence whether
> the guest issues ECS accesses or not.  All it does is change Xen's
> handling of such accesses.
> 
> Previously vPCI blindly ignored ECS accesses, so the vPCI layer
> effectively truncated them to BCS accesses.
> 
> Now, from your analysis, when MMCFG isn't active, Xen's PCI layer will
> effectively terminate ECS accesses with default behaviour, even on
> systems where IO ECS is available.
> 
> So we've changed one valid behaviour for a different valid behaviour.

Given vPCI is currently limited to PVH dom0 I doubt there are many
systems capable of running PVH dom0 but not having MMCFG support.

Best way to fix would be to implement ECS accesses in
pci_conf_{read,write}<size>() if the register is > 255 and there's no
MMCFG.

> 
> (Quick tangent...  Our PCI handling is currently very dumb. 
> pci_mmcfg_read() returns its value by pointer but the callers never
> check.  Swapping it to return by value would improve code gen quite a
> lot.  Also, when MMCFG is active we still pass BCS accesses to IO ports.)

I wonder if it's really preferred to access registers below 255 using
the IO ports, as Linux seems to do the same (prefer IO port access if
possible).

> So I think we do want to improve Xen's general behaviour too, but this
> difference here doesn't concern me.
> 
> >
> >> Fixes: e0fbf3bf9871 ("x86/AMD: correct certain Fam17 checks")
> >> Fixes: 2d67a7a4d37a ("x86: synchronize PCI config space access decoding")
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >> ---
> >> CC: Jan Beulich <JBeulich@xxxxxxxx>
> >> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> CC: Wei Liu <wl@xxxxxxx>
> >>
> >> Whoever reviewed those two patches originally was clearly a fool...
> >> ---
> >>  xen/arch/x86/hvm/io.c             | 24 ++++++------------------
> >>  xen/arch/x86/hvm/ioreq.c          | 19 ++-----------------
> >>  xen/arch/x86/include/asm/hvm/io.h |  4 ----
> >>  xen/arch/x86/include/asm/pci.h    | 26 ++++++++++++++++++++++++--
> >>  xen/arch/x86/pv/emul-priv-op.c    | 19 ++++++-------------
> >>  5 files changed, 38 insertions(+), 54 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> >> index 5ae209d3b6b3..b0d3c236e985 100644
> >> --- a/xen/arch/x86/hvm/io.c
> >> +++ b/xen/arch/x86/hvm/io.c
> >> @@ -248,20 +248,6 @@ void register_g2m_portio_handler(struct domain *d)
> >>      handler->ops = &g2m_portio_ops;
> >>  }
> >>  
> >> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
> >> -                                 pci_sbdf_t *sbdf)
> >> -{
> >> -    ASSERT(CF8_ENABLED(cf8));
> >> -
> >> -    sbdf->bdf = CF8_BDF(cf8);
> >> -    sbdf->seg = 0;
> >> -    /*
> >> -     * NB: the lower 2 bits of the register address are fetched from the
> >> -     * offset into the 0xcfc register when reading/writing to it.
> >> -     */
> >> -    return CF8_ADDR_LO(cf8) | (addr & 3);
> >> -}
> >> -
> >>  /* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> >>  static bool cf_check vpci_portio_accept(
> >>      const struct hvm_io_handler *handler, const ioreq_t *p)
> >> @@ -275,7 +261,7 @@ static int cf_check vpci_portio_read(
> >>  {
> >>      const struct domain *d = current->domain;
> >>      unsigned int reg;
> >> -    pci_sbdf_t sbdf;
> >> +    pci_sbdf_t sbdf = {};
> >>      uint32_t cf8;
> >>  
> >>      *data = ~(uint64_t)0;
> >> @@ -292,7 +278,8 @@ static int cf_check vpci_portio_read(
> >>      if ( !CF8_ENABLED(cf8) )
> >>          return X86EMUL_UNHANDLEABLE;
> >>  
> >> -    reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
> >> +    sbdf.bdf = CF8_BDF(cf8);
> >> +    reg = CF8_REG(cf8) | (addr & 3);
> >>  
> >>      if ( !vpci_access_allowed(reg, size) )
> >>          return X86EMUL_OKAY;
> >> @@ -308,7 +295,7 @@ static int cf_check vpci_portio_write(
> >>  {
> >>      struct domain *d = current->domain;
> >>      unsigned int reg;
> >> -    pci_sbdf_t sbdf;
> >> +    pci_sbdf_t sbdf = {};
> >>      uint32_t cf8;
> >>  
> >>      if ( addr == 0xcf8 )
> >> @@ -323,7 +310,8 @@ static int cf_check vpci_portio_write(
> >>      if ( !CF8_ENABLED(cf8) )
> >>          return X86EMUL_UNHANDLEABLE;
> >>  
> >> -    reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
> >> +    sbdf.bdf = CF8_BDF(cf8);
> >> +    reg = CF8_REG(cf8) | (addr & 3);
> >>  
> >>      if ( !vpci_access_allowed(reg, size) )
> >>          return X86EMUL_OKAY;
> >> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> >> index 0bdcca1e1a5f..325a9d118e52 100644
> >> --- a/xen/arch/x86/hvm/ioreq.c
> >> +++ b/xen/arch/x86/hvm/ioreq.c
> >> @@ -285,27 +285,12 @@ bool arch_ioreq_server_get_type_addr(const struct 
> >> domain *d,
> >>           (p->addr & ~3) == 0xcfc &&
> >>           CF8_ENABLED(cf8) )
> >>      {
> >> -        unsigned int x86_fam, reg;
> >> -        pci_sbdf_t sbdf;
> >> -
> >> -        reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
> >> +        pci_sbdf_t sbdf = { .bdf = CF8_BDF(cf8) };
> >> +        unsigned int reg = CF8_REG(cf8) | (p->addr & 3);
> >>  
> >>          /* PCI config data cycle */
> >>          *type = XEN_DMOP_IO_RANGE_PCI;
> >>          *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
> >> -        /* AMD extended configuration space access? */
> >> -        if ( CF8_ADDR_HI(cf8) &&
> >> -             d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
> >> -             (x86_fam = get_cpu_family(
> >> -                 d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
> >> -             x86_fam < 0x17 )
> >> -        {
> >> -            uint64_t msr_val;
> >> -
> >> -            if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
> >> -                 (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
> >> -                *addr |= CF8_ADDR_HI(cf8);
> >> -        }
> >>      }
> >>      else
> >>      {
> >> diff --git a/xen/arch/x86/include/asm/hvm/io.h 
> >> b/xen/arch/x86/include/asm/hvm/io.h
> >> index 54e0161b492c..3f3fb6403ccb 100644
> >> --- a/xen/arch/x86/include/asm/hvm/io.h
> >> +++ b/xen/arch/x86/include/asm/hvm/io.h
> >> @@ -144,10 +144,6 @@ void stdvga_deinit(struct domain *d);
> >>  
> >>  extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
> >>  
> >> -/* Decode a PCI port IO access into a bus/slot/func/reg. */
> >> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
> >> -                                 pci_sbdf_t *sbdf);
> >> -
> >>  /*
> >>   * HVM port IO handler that performs forwarding of guest IO ports into 
> >> machine
> >>   * IO ports.
> >> diff --git a/xen/arch/x86/include/asm/pci.h 
> >> b/xen/arch/x86/include/asm/pci.h
> >> index f4a58c8acf13..3b814f4ebacf 100644
> >> --- a/xen/arch/x86/include/asm/pci.h
> >> +++ b/xen/arch/x86/include/asm/pci.h
> >> @@ -3,10 +3,32 @@
> >>  
> >>  #include <xen/mm.h>
> >>  
> >> +/*
> >> + * PCI config space accesses with CF8/CFC:
> >> + *
> >> + * 1) Write {Enable | BDF | Reg} to CF8 to set an address
> >> + * 2) Read or write CF{C..F} to access the register
> >> + *
> >> + * For sub-dword register accesses, the bottom two register address bits 
> >> come
> >> + * from the CF{C..F} address, not from CF8.
> >> + *
> >> + * AMD have extention to this protocol to access PCIe Extend Config Space 
> >> by
> >> + * storing the 4 extra register address bits in the penultimate nibble of 
> >> CF8.
> >> + * This extention:
> >> + *  - Is unconditionally active on Fam17h and later
> >> + *  - Has model specific enablement on Fam10h thru Fam16h
> >> + *  - Has reserved behaviour in all other cases, including other vendors
> >> + *
> >> + * For simplicity and because we are permitted to, given "reserved", Xen
> >> + * always treats ECS as active when emulating guest PCI config space 
> >> accesses.
> >> + */
> >>  #define CF8_BDF(cf8)     (  ((cf8) & 0x00ffff00) >> 8)
> >> -#define CF8_ADDR_LO(cf8) (   (cf8) & 0x000000fc)
> >> -#define CF8_ADDR_HI(cf8) (  ((cf8) & 0x0f000000) >> 16)
> >>  #define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
> >> +#define CF8_REG(cf8)                                    \
> >> +    ({                                                  \
> >> +        unsigned int _c = cf8;                          \
> >> +        ((_c & 0x0f000000) >> 16) | (_c & 0xfc);        \
> >> +    })
> > What happens on Intel when the bit is set, is it just ignored?
> 
> "the bit" => the ECS nibble, or the CF8_EXT bit?

The ECS nibble.

> 
> The ECS nibble is ignored on Intel AFAICT, while the CF8_EXT bit is in a
> very AMD-only MSR, so won't exist on Intel.
> 
> > We only allow such accesses for dom0 anyway.
> 
> And guests running on AMD hardware where CF8_EXT is active on the
> northbridge of the core we are instantaneously scheduled on.
> 
> >>  
> >>  #define IS_SNB_GFX(id) (id == 0x01068086 || id == 0x01168086 \
> >>                          || id == 0x01268086 || id == 0x01028086 \
> >> diff --git a/xen/arch/x86/pv/emul-priv-op.c 
> >> b/xen/arch/x86/pv/emul-priv-op.c
> >> index 5da00e24e4ff..008367195c78 100644
> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -245,19 +245,7 @@ static bool pci_cfg_ok(struct domain *currd, unsigned 
> >> int start,
> >>          if ( ro_map && test_bit(machine_bdf, ro_map) )
> >>              return false;
> >>      }
> >> -    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
> >> -    /* AMD extended configuration space access? */
> >> -    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
> >> -         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >> -         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 < 0x17 )
> >> -    {
> >> -        uint64_t msr_val;
> >> -
> >> -        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
> >> -            return false;
> >> -        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
> >> -            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
> >> -    }
> >> +    start |= CF8_REG(currd->arch.pci_cf8);
> >>  
> >>      return !write ?
> >>             xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
> >> @@ -1104,6 +1092,11 @@ static int cf_check write_msr(
> >>          if ( !is_hwdom_pinned_vcpu(curr) )
> >>              return X86EMUL_OKAY;
> >>          if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
> >> +             /*
> >> +              * TODO: this is broken.  What happens when dom0 is pinned 
> >> but
> >> +              * can't see the full system?  CF8_EXT probably ought to be a
> >> +              * Xen-owned setting, and made symmetric across the system.
> >> +              */
> > I would assume CF8_EXT would be symmetric across the system, specially
> > given that it seems to be phased out and only used in older AMD
> > families that where all symmetric?
> 
> The CF8_EXT bit has been phased out.  The IO ECS functionality still exists.
> 
> But yes, the more I think about letting dom0 play with this, the more I
> think it is a fundamentally broken idea...  I bet it was from the very
> early AMD Fam10h days where dom0 knew how to turn it on, and Xen was
> trying to pretend it didn't have to touch any PCI devices.

It seems to me Xen should set CF8_EXT on all threads (when available)
and expose it to dom0, so that accesses using pci_{conf,write}_read()
work as expected?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.