[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pci: Correct ECS handling with CF8/CFC emulation


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 3 Apr 2023 11:16:52 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ybvD7N7hjF7U5elqg+fiOU6qIf03CIqjL/+AdVJOIWg=; b=FHmsqiINj7FQ0uXpccFF+uvDRGkotWDy7t/fdePZohheXwfB36Hd9EcLpKpnk5xY7Tg+Ayhw+pBlHwhyDr6YpG63NndpT4AKxBnmmCpn7GcVBnQPpKEVl1PaC3l7UpQIMm+8BGZgx8BSpWY/Q4WuZyaS26EMiqcOBCzqPXip3phO+ZabtbakotFkRMlLaCDuILA5givxfElzSsTdlkiglhnAVbU8u48UdV3p+oA9OM652Jqx6lUu7fSO8Z2jehVY9KK6AaqSfIbeYIJb0ozRtaTosuFvn2QxB7izQ0PF+MTSt1oldMhVXuzKLEP91egSBk819q9uftcvoYWSDpFQHg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U5Lur6ZyE0aACszxuCNOIXsT6RRjPrhVIVvHgKHoclm829RtHI3+Z08iFsyahZyOj94qB8coR7J2EpOinJ1lH5oQjU7fjhSCXSQFSxDc2ocJQ9oyYxe4dd2pfu4TNzIukXujds3StmkSRdesZLz+OgVi135RhxNVbTFmy7RD+oHrWw8mLXEzlYE0ZmRin/lGB18CZM0ZpMBzz/St4s7LHkGa/LwQuiziBYUNjdXnHkZLpJ+6T7VaRO/lFd3Wu+zqRWlE2tQomO67p3cNj1Zme+pezHp84ndoFnpJ8xr9F7tXS5aF3m66DgfkcwS5RuaMgyPnLfJBD/nxKA1ZVJJOPg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 03 Apr 2023 10:17:24 +0000
  • Ironport-data: A9a23:cUGwyqif1xAN3cxzictcwVL4X161TBEKZh0ujC45NGQN5FlHY01je htvUWuHO6yCYTHyLt0iOY+yph8CvcTUydJkQAo6/i4yQSkb9cadCdqndUqhZCn6wu8v7q5Ex 55HNoSfdpBcolv0/ErF3m3J9CEkvU2wbuOgTrWCYmYpHlUMpB4J0XpLg/Q+jpNjne+3CgaMv cKai8DEMRqu1iUc3lg8sspvkzsy+qWi0N8klgZmP6sT4AeFzyB94K83fsldEVOpGuG4IcbiL wrz5OnR1n/U+R4rFuSknt7TGqHdauePVeQmoiM+t5mK2nCulARrukoIHKN0hXNsoyeIh7hMJ OBl7vRcf+uL0prkw4zxWzEAe8130DYvFLXveRBTuuTLp6HKnueFL1yDwyjaMKVBktubD12i+ tQXDzIHSxaA3Nuuwb2pYdQ22McJDs/0adZ3VnFIlVk1DN4AaLWaG+Dgw4Ad2z09wMdTAfzZe swVLyJ1awjNaAFOPVFRD48imOCvhT/0dDgwRFC9/PJrpTSMilMpluG1YLI5efTTLSlRtm+eq njL4CLSBRYCOcbE4TGE7mitlqnEmiaTtIc6TeXjqqEy2QXCroAVIDA2aH+DrKiXsECzcMJ+B nAZpwg1v4FnoSRHSfG4BXVUukWsrhMaHtZdDeA+wAWM0bbPpRaUAHAeSTxMY8Bgs9U5LRQI/ FKUm9LiBRR0raaYD3ma89+8sjeaKSUTa2gYakc5oRAt5tDipMQ5iE3JR9M6SKqt1IStSXf33 iyAqzU4i/MLl8kX2q6n/FfBxTWxupzOSQ1z7QLSNo640j5EiEeeT9TAwTDmATxodt7xooWp1 JTcp/Wj0Q==
  • Ironport-hdrordr: A9a23:rh5QuqFqAHsk1/C1pLqF9ZLXdLJyesId70hD6qkvc3Fom52j/f xGws5x6faVslkssb8b6Km90dq7MBThHPlOkPQs1NaZLXPbUQ6TQL2KgrGSoAEIdxeOk9K1kJ 0QCJSWa+eAc2SS7/yb3ODQKb9Jrri6GeKT9J/jJh9WPH5XgspbnmNE42igYytLrUV9dPgE/M 323Ls6m9PsQwVeUiz9bUN1LdTrlpnurtbLcBQGDxko5E2nii6p0qfzF1y1zwoTSDRGxJYl6C zgnxbi7quunvmnwluEvlWjo6h+qZ/E8J9uFcaMgs8aJnHFjRupXp1oX/mvrS04u+am7XctiZ 3prw07N8p+xnvNdiWeoAfr2SPnzDEygkWShGOwsD/Gm4jUVTg6A81OicZwdQbY0VMpuJVZ3L hQ12yUmpJLBVeY9R6NreTgZlVPrA6ZsHAimekcgzh2VpYfUqZYqcg68FlOGJkNMSrm4MQMEf VoDuvb+PFKGGnqJEzxjy1K+piBT34zFhCJTgwrvdGU6SFfmDRDw04R1KUk7wA93aN4b6MBy/ XPM6xumr0LZNQRd7hBCOAIRtbyInDRQDrXWVjiYWjPJeUiATbgupT36LI66KWBY5oT1qY/n5 zHTRdxqXMyQUTzEseDtac7sywleF/NHwgF9/suoqSQ4tbHNf7W2Gy4OR4TevKb0rYi6paxYY f1BHpUa8WTWVcGV7w5mTEWYKMiWkX2YPdly+rTZGj+0v4jCreawNAzI8yjbYbFIHIDZl7VJE clcXzaGPhgh3rbL0MQxiKhFE/QRg==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03/04/2023 9:57 am, Roger Pau Monné wrote:
> On Fri, Mar 31, 2023 at 06:57:19PM +0100, Andrew Cooper wrote:
>> This started by noticing the dubious Fam17h check in
>> arch_ioreq_server_get_type_addr(), and then realising that checking the host
>> CF8_EXT setting is utterly bogus for the guest <-> qemu emulation path.
>>
>> What should be consulted here is the guest's MSR_AMD64_NB_CFG setting, but a)
>> there isn't one, and b) the vestigial remnants of the cross-vendor migration
>> logic cause MSR_AMD64_NB_CFG to be unconditionally read-as-zero, making the
>> CF8_EXT path unused by any suitably-written OS in the first place.
>>
>> MSR_AMD64_NB_CFG really has been removed on Fam17h (it's now just a 
>> read-zero,
>> write-discard stub), and the ECS extension is unconditionally active, meaning
>> it is not correct for Xen to ignore the ExtRegNo field on newer AMD CPUs.
>>
>> It turns out that Xen did even have this behaviour in 4.5 and earlier, with
>> this problematic CF8_EXT checking being added in 4.6.  Therefore, revert back
>> to Xen's older behaviour - it is objectively less wrong than the current
>> logic.
>>
>> While fixing this, get rid of hvm_pci_decode_addr() - it is more complicated
>> to follow (and to call) than using the CF8* macros in the calling context.
>> Rename CF8_ADDR() to CF8_REG() to better describe what it does, and write a
>> comment explaining all about CF8/CFC accesses.
>>
>> There's one rare case when CF8_EXT is visible to guests, and that is for a
>> pinned hwdom.  Right now, we permit such a dom0 to modify the CF8_EXT bit, 
>> but
>> this seems like a very unwise idea.  Leave a TODO for people to consider.
> One weirdness I've noticed is that for vPCI we decode the accesses
> taking the extended CF8 bit after this change, but then if the access
> is relayed to the hardware using vpci_{read,write}_hw it will be
> forwarded to the hardware using pci_conf_{read,write}<size> which
> doesn't have support for CF8_EXT.  So if the underlying hardware
> doesn't have MMCFG support and the reg is > 255 it will be dropped.

It is important to stress that this change does not influence whether
the guest issues ECS accesses or not.  All it does is change Xen's
handling of such accesses.

Previously vPCI blindly ignored ECS accesses, so the vPCI layer
effectively truncated them to BCS accesses.

Now, from your analysis, when MMCFG isn't active, Xen's PCI layer will
effectively terminate ECS accesses with default behaviour, even on
systems where IO ECS is available.

So we've changed one valid behaviour for a different valid behaviour.


(Quick tangent...  Our PCI handling is currently very dumb. 
pci_mmcfg_read() returns its value by pointer but the callers never
check.  Swapping it to return by value would improve code gen quite a
lot.  Also, when MMCFG is active we still pass BCS accesses to IO ports.)

So I think we do want to improve Xen's general behaviour too, but this
difference here doesn't concern me.

>
>> Fixes: e0fbf3bf9871 ("x86/AMD: correct certain Fam17 checks")
>> Fixes: 2d67a7a4d37a ("x86: synchronize PCI config space access decoding")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> CC: Wei Liu <wl@xxxxxxx>
>>
>> Whoever reviewed those two patches originally was clearly a fool...
>> ---
>>  xen/arch/x86/hvm/io.c             | 24 ++++++------------------
>>  xen/arch/x86/hvm/ioreq.c          | 19 ++-----------------
>>  xen/arch/x86/include/asm/hvm/io.h |  4 ----
>>  xen/arch/x86/include/asm/pci.h    | 26 ++++++++++++++++++++++++--
>>  xen/arch/x86/pv/emul-priv-op.c    | 19 ++++++-------------
>>  5 files changed, 38 insertions(+), 54 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
>> index 5ae209d3b6b3..b0d3c236e985 100644
>> --- a/xen/arch/x86/hvm/io.c
>> +++ b/xen/arch/x86/hvm/io.c
>> @@ -248,20 +248,6 @@ void register_g2m_portio_handler(struct domain *d)
>>      handler->ops = &g2m_portio_ops;
>>  }
>>  
>> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
>> -                                 pci_sbdf_t *sbdf)
>> -{
>> -    ASSERT(CF8_ENABLED(cf8));
>> -
>> -    sbdf->bdf = CF8_BDF(cf8);
>> -    sbdf->seg = 0;
>> -    /*
>> -     * NB: the lower 2 bits of the register address are fetched from the
>> -     * offset into the 0xcfc register when reading/writing to it.
>> -     */
>> -    return CF8_ADDR_LO(cf8) | (addr & 3);
>> -}
>> -
>>  /* vPCI config space IO ports handlers (0xcf8/0xcfc). */
>>  static bool cf_check vpci_portio_accept(
>>      const struct hvm_io_handler *handler, const ioreq_t *p)
>> @@ -275,7 +261,7 @@ static int cf_check vpci_portio_read(
>>  {
>>      const struct domain *d = current->domain;
>>      unsigned int reg;
>> -    pci_sbdf_t sbdf;
>> +    pci_sbdf_t sbdf = {};
>>      uint32_t cf8;
>>  
>>      *data = ~(uint64_t)0;
>> @@ -292,7 +278,8 @@ static int cf_check vpci_portio_read(
>>      if ( !CF8_ENABLED(cf8) )
>>          return X86EMUL_UNHANDLEABLE;
>>  
>> -    reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
>> +    sbdf.bdf = CF8_BDF(cf8);
>> +    reg = CF8_REG(cf8) | (addr & 3);
>>  
>>      if ( !vpci_access_allowed(reg, size) )
>>          return X86EMUL_OKAY;
>> @@ -308,7 +295,7 @@ static int cf_check vpci_portio_write(
>>  {
>>      struct domain *d = current->domain;
>>      unsigned int reg;
>> -    pci_sbdf_t sbdf;
>> +    pci_sbdf_t sbdf = {};
>>      uint32_t cf8;
>>  
>>      if ( addr == 0xcf8 )
>> @@ -323,7 +310,8 @@ static int cf_check vpci_portio_write(
>>      if ( !CF8_ENABLED(cf8) )
>>          return X86EMUL_UNHANDLEABLE;
>>  
>> -    reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
>> +    sbdf.bdf = CF8_BDF(cf8);
>> +    reg = CF8_REG(cf8) | (addr & 3);
>>  
>>      if ( !vpci_access_allowed(reg, size) )
>>          return X86EMUL_OKAY;
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index 0bdcca1e1a5f..325a9d118e52 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -285,27 +285,12 @@ bool arch_ioreq_server_get_type_addr(const struct 
>> domain *d,
>>           (p->addr & ~3) == 0xcfc &&
>>           CF8_ENABLED(cf8) )
>>      {
>> -        unsigned int x86_fam, reg;
>> -        pci_sbdf_t sbdf;
>> -
>> -        reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
>> +        pci_sbdf_t sbdf = { .bdf = CF8_BDF(cf8) };
>> +        unsigned int reg = CF8_REG(cf8) | (p->addr & 3);
>>  
>>          /* PCI config data cycle */
>>          *type = XEN_DMOP_IO_RANGE_PCI;
>>          *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
>> -        /* AMD extended configuration space access? */
>> -        if ( CF8_ADDR_HI(cf8) &&
>> -             d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
>> -             (x86_fam = get_cpu_family(
>> -                 d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
>> -             x86_fam < 0x17 )
>> -        {
>> -            uint64_t msr_val;
>> -
>> -            if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
>> -                 (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
>> -                *addr |= CF8_ADDR_HI(cf8);
>> -        }
>>      }
>>      else
>>      {
>> diff --git a/xen/arch/x86/include/asm/hvm/io.h 
>> b/xen/arch/x86/include/asm/hvm/io.h
>> index 54e0161b492c..3f3fb6403ccb 100644
>> --- a/xen/arch/x86/include/asm/hvm/io.h
>> +++ b/xen/arch/x86/include/asm/hvm/io.h
>> @@ -144,10 +144,6 @@ void stdvga_deinit(struct domain *d);
>>  
>>  extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
>>  
>> -/* Decode a PCI port IO access into a bus/slot/func/reg. */
>> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
>> -                                 pci_sbdf_t *sbdf);
>> -
>>  /*
>>   * HVM port IO handler that performs forwarding of guest IO ports into 
>> machine
>>   * IO ports.
>> diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h
>> index f4a58c8acf13..3b814f4ebacf 100644
>> --- a/xen/arch/x86/include/asm/pci.h
>> +++ b/xen/arch/x86/include/asm/pci.h
>> @@ -3,10 +3,32 @@
>>  
>>  #include <xen/mm.h>
>>  
>> +/*
>> + * PCI config space accesses with CF8/CFC:
>> + *
>> + * 1) Write {Enable | BDF | Reg} to CF8 to set an address
>> + * 2) Read or write CF{C..F} to access the register
>> + *
>> + * For sub-dword register accesses, the bottom two register address bits 
>> come
>> + * from the CF{C..F} address, not from CF8.
>> + *
>> + * AMD have extention to this protocol to access PCIe Extend Config Space by
>> + * storing the 4 extra register address bits in the penultimate nibble of 
>> CF8.
>> + * This extention:
>> + *  - Is unconditionally active on Fam17h and later
>> + *  - Has model specific enablement on Fam10h thru Fam16h
>> + *  - Has reserved behaviour in all other cases, including other vendors
>> + *
>> + * For simplicity and because we are permitted to, given "reserved", Xen
>> + * always treats ECS as active when emulating guest PCI config space 
>> accesses.
>> + */
>>  #define CF8_BDF(cf8)     (  ((cf8) & 0x00ffff00) >> 8)
>> -#define CF8_ADDR_LO(cf8) (   (cf8) & 0x000000fc)
>> -#define CF8_ADDR_HI(cf8) (  ((cf8) & 0x0f000000) >> 16)
>>  #define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
>> +#define CF8_REG(cf8)                                    \
>> +    ({                                                  \
>> +        unsigned int _c = cf8;                          \
>> +        ((_c & 0x0f000000) >> 16) | (_c & 0xfc);        \
>> +    })
> What happens on Intel when the bit is set, is it just ignored?

"the bit" => the ECS nibble, or the CF8_EXT bit?

The ECS nibble is ignored on Intel AFAICT, while the CF8_EXT bit is in a
very AMD-only MSR, so won't exist on Intel.

> We only allow such accesses for dom0 anyway.

And guests running on AMD hardware where CF8_EXT is active on the
northbridge of the core we are instantaneously scheduled on.

>>  
>>  #define IS_SNB_GFX(id) (id == 0x01068086 || id == 0x01168086 \
>>                          || id == 0x01268086 || id == 0x01028086 \
>> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
>> index 5da00e24e4ff..008367195c78 100644
>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -245,19 +245,7 @@ static bool pci_cfg_ok(struct domain *currd, unsigned 
>> int start,
>>          if ( ro_map && test_bit(machine_bdf, ro_map) )
>>              return false;
>>      }
>> -    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
>> -    /* AMD extended configuration space access? */
>> -    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
>> -         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
>> -         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 < 0x17 )
>> -    {
>> -        uint64_t msr_val;
>> -
>> -        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
>> -            return false;
>> -        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
>> -            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
>> -    }
>> +    start |= CF8_REG(currd->arch.pci_cf8);
>>  
>>      return !write ?
>>             xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
>> @@ -1104,6 +1092,11 @@ static int cf_check write_msr(
>>          if ( !is_hwdom_pinned_vcpu(curr) )
>>              return X86EMUL_OKAY;
>>          if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
>> +             /*
>> +              * TODO: this is broken.  What happens when dom0 is pinned but
>> +              * can't see the full system?  CF8_EXT probably ought to be a
>> +              * Xen-owned setting, and made symmetric across the system.
>> +              */
> I would assume CF8_EXT would be symmetric across the system, specially
> given that it seems to be phased out and only used in older AMD
> families that where all symmetric?

The CF8_EXT bit has been phased out.  The IO ECS functionality still exists.

But yes, the more I think about letting dom0 play with this, the more I
think it is a fundamentally broken idea...  I bet it was from the very
early AMD Fam10h days where dom0 knew how to turn it on, and Xen was
trying to pretend it didn't have to touch any PCI devices.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.