[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor




On 29.09.22 07:36, Ajay Kaher wrote:
On 13/09/22, 7:05 PM, "Vitaly Kuznetsov" <vkuznets@xxxxxxxxxx> wrote:
Thanks Vitaly for your response.

1. we have multiple objects of struct pci_raw_ops, 2. adding 'priority' field 
to struct pci_raw_ops
doesn't seems to be appropriate as need to take decision which object of struct 
pci_raw_ops has
to be used, not something with-in struct pci_raw_ops.
I'm not sure I follow, you have two instances of 'struct pci_raw_ops'
which are called 'raw_pci_ops' and 'raw_pci_ext_ops'. What if you do
something like (completely untested):

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 70533fdcbf02..fb8270fa6c78 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -116,6 +116,7 @@ extern void (*pcibios_disable_irq)(struct pci_dev *dev);
extern bool mp_should_keep_irq(struct device *dev);

struct pci_raw_ops {
+       int rating;
          int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
                                                int reg, int len, u32 *val);
          int (*write)(unsigned int domain, unsigned int bus, unsigned int 
devfn,
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index ddb798603201..e9965fd11576 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
  int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
                                                 int reg, int len, u32 *val)
{
-       if (domain == 0 && reg < 256 && raw_pci_ops)
+       if (domain == 0 && reg < 256 && raw_pci_ops &&
+           (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= 
raw_pci_ops->rating))
                 return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
         if (raw_pci_ext_ops)
                 return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, 
val);
@@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, 
unsigned int devfn,
  int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
                                                 int reg, int len, u32 val)
{
-       if (domain == 0 && reg < 256 && raw_pci_ops)
+       if (domain == 0 && reg < 256 && raw_pci_ops &&
+           (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= 
raw_pci_ops->rating))
                 return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
          if (raw_pci_ext_ops)
                 return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, 
val);

and then somewhere in Vmware hypervisor initialization code
(arch/x86/kernel/cpu/vmware.c) you do

  raw_pci_ext_ops->rating = 100;
Thanks Vitaly, for your review and helping us to improve the code.

I was working to make changes as you suggested, but before sending v3 would 
like to
discuss on following:

If we add rating with-in struct pci_raw_ops then we can't have pci_mmcfg as 
const,
and following change is must in arch/x86/pci/mmconfig_64.c:

-const struct pci_raw_ops pci_mmcfg = {
+struct pci_raw_ops pci_mmcfg = {
         .read =         pci_mmcfg_read,
         .write =        pci_mmcfg_write,
};

So to avoid this change, is it fine to have global bool prefer_raw_pci_ext_ops?

And raw_pci_read() will have following change:

-       if (domain == 0 && reg < 256 && raw_pci_ops)
+       if (domain == 0 && reg < 256 && raw_pci_ops &&
+            (!prefer_raw_pci_ext_ops ||  !raw_pci_ext_ops)

why wouldn't it work?

(diclaimer: completely untested, raw_pci_ops/raw_pci_ext_ops
initialization has to be checked so 'rating' is not garbage).

It's a generic solution for all hypervisor (sorry for earlier wrong
Subject), not specific to VMware. Further looking for feedback if it's
impacting to any hypervisor.
That's the tricky part. We can check modern hypervisor versions, but
what about all other versions in existence? How can we know that there's
no QEMU/Hyper-V/... version out there where MMIO path is broken? I'd
suggest we limit the change to Vmware hypervisor, other hypervisors may
use the same mechanism (like the one above) later (but the person
suggesting the patch is always responsible for the research why it is
safe to do so).
Ok, as of now we will make this change specific to VMware hypervisor.


Is there a way we can make it an ACPI property in MCFG to have the environment self-describe the fact that it's safe to do ECAM access for config space access over legacy PIO? That way we don't need to patch guests every time a hypervisor decides that it's safe to prefer ECAM.

Also, Michael (CC'ed) mentioned that according to spec, your PCIe host bridge with PCI_COMMAND.MEMORY=0 would stop responding to its ECAM window. Given that most ARM systems have no PIO fallback path, we want to make sure we never hit that condition.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.