[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor



On Oct 3, 2022, at 2:06 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:

> ⚠ External Email
> 
> On October 3, 2022 10:34:15 AM PDT, Nadav Amit <namit@xxxxxxxxxx> wrote:
>> On Oct 3, 2022, at 8:03 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>> 
>>> Not my but rather PCI maintainer's call but IMHO dropping 'const' is
>>> better, introducing a new global var is our 'last resort' and should be
>>> avoided whenever possible. Alternatively, you can add a
>>> raw_pci_ext_ops_preferred() function checking somethin within 'struct
>>> hypervisor_x86' but I'm unsure if it's better.
>>> 
>>> Also, please check Alex' question/suggestion.
>> 
>> Here is my take (and Ajay knows probably more than me):
>> 
>> Looking briefly on MCFG, I do not see a clean way of using the ACPI table.
>> The two options are either to use a reserved field (which who knows, might
>> be used one day) or some OEM ID. I am also not familiar with
>> PCI_COMMAND.MEMORY=0, so Ajay can hopefully give some answer about that.
>> 
>> Anyhow, I understand (although not relate) to the objection for a new global
>> variable. How about explicitly calling this hardware bug a “bug” and using
>> the proper infrastructure? Calling it explicitly a bug may even push whoever
>> can to resolve it.
>> 
>> IOW, how about doing something along the lines of (not tested):
>> 
>> 
>> -- >8 --
>> 
>> Subject: [PATCH] x86/PCI: Prefer MMIO over PIO on VMware hypervisor
>> 
>> ---
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/kernel/cpu/common.c       | 2 ++
>> arch/x86/kernel/cpu/vmware.c       | 2 ++
>> arch/x86/pci/common.c              | 6 ++++--
>> 4 files changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/cpufeatures.h 
>> b/arch/x86/include/asm/cpufeatures.h
>> index ef4775c6db01..216b6f357b6d 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -460,5 +460,6 @@
>> #define X86_BUG_MMIO_UNKNOWN          X86_BUG(26) /* CPU is too old and its 
>> MMIO Stale Data status is unknown */
>> #define X86_BUG_RETBLEED              X86_BUG(27) /* CPU is affected by 
>> RETBleed */
>> #define X86_BUG_EIBRS_PBRSB           X86_BUG(28) /* EIBRS is vulnerable to 
>> Post Barrier RSB Predictions */
>> +#define X86_BUG_ECAM_MMIO             X86_BUG(29) /* ECAM MMIO is buggy and 
>> PIO is preferable */
>> 
>> #endif /* _ASM_X86_CPUFEATURES_H */
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index 3e508f239098..c94175fa304b 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -1299,6 +1299,8 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 
>> *c)
>> {
>>      u64 ia32_cap = x86_read_arch_cap_msr();
>> 
>> +      setup_force_cpu_bug(X86_BUG_ECAM_MMIO);
>> +
>>      /* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not 
>> mitigated */
>>      if (!cpu_matches(cpu_vuln_whitelist, NO_ITLB_MULTIHIT) &&
>>          !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
>> diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
>> index 02039ec3597d..8903776284a6 100644
>> --- a/arch/x86/kernel/cpu/vmware.c
>> +++ b/arch/x86/kernel/cpu/vmware.c
>> @@ -385,6 +385,8 @@ static void __init vmware_set_capabilities(void)
>>              setup_force_cpu_cap(X86_FEATURE_VMCALL);
>>      else if (vmware_hypercall_mode == CPUID_VMWARE_FEATURES_ECX_VMMCALL)
>>              setup_force_cpu_cap(X86_FEATURE_VMW_VMMCALL);
>> +
>> +      setup_clear_cpu_cap(X86_BUG_ECAM_MMIO);
>> }
>> 
>> static void __init vmware_platform_setup(void)
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index ddb798603201..bc81cf4c1014 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
>> int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                              int reg, int len, u32 *val)
>> {
>> -      if (domain == 0 && reg < 256 && raw_pci_ops)
>> +      if (domain == 0 && reg < 256 && raw_pci_ops &&
>> +          (boot_cpu_has_bug(X86_BUG_ECAM_MMIO) || !raw_pci_ext_ops))
>>              return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
>>      if (raw_pci_ext_ops)
>>              return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
>> @@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, 
>> unsigned int devfn,
>> int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                              int reg, int len, u32 val)
>> {
>> -      if (domain == 0 && reg < 256 && raw_pci_ops)
>> +      if (domain == 0 && reg < 256 && raw_pci_ops &&
>> +          (boot_cpu_has_bug(X86_BUG_ECAM_MMIO) || !raw_pci_ext_ops))
>>              return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
>>      if (raw_pci_ext_ops)
>>              return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, 
>> val);
> 
> Also... any reason we can't just set raw_pci_ops == raw_ext_pci_ops for the 
> case when the latter is preferred, and dispense with the conditionals in the 
> use path? Similarly, raw_ext_pci_ops could be pointed to error routines 
> instead of left at NULL.

I understood from Ajay that the initialization of raw_ext_pci_ops can be
done after the hypervisor initialization takes place, so doing what exactly
what you proposed by is not possible. It can probably be resolved, but I do
not think the end result would be simpler or cleaner. I think that the “bug”
solution really conveys the behavior.

IIUC performance would not be noticeable affected by 2 more conditional
branches.


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.