[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
Hi, this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the situation captured in RHBZ#688673, in a nutshell: - let's say we have an Intel 82576 card (igb), - the card exports virtual functions (igbvf), - one virtual function is passed through to a PV guest, - the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, sets up three MSI-X vectors for the virtual function PCI device, - when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because nobody ever reaches dom0's pci_disable_msix() / msi_remove_pci_irq_vectors() during shutdown, - therefore configuration of the VF during the next boot of such a guest doesn't succeed (dom0 thinks, based on its stale list, that MSI-X vectors are already allocated/mapped for the device) dom0 (msix_capability_init(), output beautified a bit): msix entry 0 for dev 01:10:0 are not freed before acquire again. msix entry 1 for dev 01:10:0 are not freed before acquire again. msix entry 2 for dev 01:10:0 are not freed before acquire again. guest: Determining IP information for eth1... Failed to obtain physical IRQ 255 Failed to obtain physical IRQ 254 Failed to obtain physical IRQ 253 - if the device or the full igbvf module is removed before shutdown in the guest (in case the boot was successful to begin with!), then pci_disable_msix() is called and everything works fine; the next boot / ifup succeeds. I backported c/s 1070 to the RHEL-5 kernel, but it was not enough -- this is not an abrupt termination (destroy), but a contolled shutdown. Here's what I suspect happens (in RHEL-5) when device_shutdown() walks the list of devices in the PV domU and calls the appropriate shutdown hook for igbvf: domU: igbvf_shutdown() igbvf_suspend() pci_disable_device() pcibios_disable_device() pcibios_disable_resources() pci_write_config_word( dev, PCI_COMMAND, orig & ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) ) ... pcifront_bus_write() do_pci_op(XEN_PCI_OP_conf_write) dom0: pciback_do_op() pciback_config_write() conf_space_write() command_write() [via PCI_COMMAND funcptr] pci_disable_device() disable_msi_mode() dev->msix_enabled = 0; The final assignment above precludes c/s 1070 from doing the job. Of course igbvf_shutdown() could invoke igbvf_remove() instead (or another lower-level function that ends up in dom0's pci_disable_msix()), but it should not depend on the guest playing nice. Below's my proposed patch for RHEL-5, ported to upstream. ----v---- introduce msi_prune_pci_irq_vectors() extend pciback_reset_device() with the following functionality (msi_prune_pci_irq_vectors()): - remove all (MSI-X vector, entry number) pairs that might still belong, in dom0's mind, to the PCI device being reset, - unmap some space "used for saving/restoring MSI-X tables" that might otherwise go leaked The function, modeled after msi_remove_pci_irq_vectors(), doesn't touch the hypervisor, nor the PCI device; it only trims the list used by dom0 for filtering. Justification being: when this is called, either the owner domain is gone already (or very close to being gone), or the device was already correctly detached. ----^---- Now I think one comment in the patch below does not hold for upstream: "msi_dev_head is only maintained in dom0". According to c/s 680, this doesn't seem to be the case for upstream. Is the case described above possible at all in upstream? Do you think the fix I propose is correct? It seemed to do the job with RHEL-5 in my testing. dom0: PCI: 0000:01:10.0: cleaning up MSI-X entry 0 PCI: 0000:01:10.0: cleaning up MSI-X entry 1 PCI: 0000:01:10.0: cleaning up MSI-X entry 2 PCI: 0000:01:10.0: cleaning up mask_base Just in case: drivers/pci/msi-xen.c | 46 ++++++++++++++++++++++++++++++++++++++ drivers/xen/pciback/pciback_ops.c | 5 ++++ include/linux/pci.h | 1 3 files changed, 52 insertions(+) Signed-off-by: Laszlo Ersek<lersek@xxxxxxxxxx> Thank you very much! lacos diff -r 876a5aaac026 drivers/pci/msi-xen.c --- a/drivers/pci/msi-xen.c Thu May 26 12:33:41 2011 +0100 +++ b/drivers/pci/msi-xen.c Tue May 31 19:17:26 2011 +0200 @@ -894,6 +894,51 @@ void msi_remove_pci_irq_vectors(struct p dev->irq = msi_dev_entry->default_irq; } +void msi_prune_pci_irq_vectors(struct pci_dev *dev) +{ + unsigned long flags; + struct msi_dev_list *msi_dev_scan, *msi_dev_entry = NULL; + struct msi_pirq_entry *pirq_entry, *tmp; + + if (!pci_msi_enable || !dev) + return; + + /* msi_dev_head is only maintained in dom0 */ + BUG_ON(!is_initial_xendomain()); + + /* search for the set of MSI-X vectors, don't extend list */ + spin_lock_irqsave(&msi_dev_lock, flags); + list_for_each_entry(msi_dev_scan, &msi_dev_head, list) + if (msi_dev_scan->dev == dev) { + msi_dev_entry = msi_dev_scan; + break; + } + spin_unlock_irqrestore(&msi_dev_lock, flags); + if (msi_dev_entry == NULL) + return; + + /* Empty the set. No PHYSDEVOP_unmap_pirq hypercalls: the domU is + * already gone, or the device is already unplugged. + */ + spin_lock_irqsave(&msi_dev_entry->pirq_list_lock, flags); + if (!list_empty(&msi_dev_entry->pirq_list_head)) + list_for_each_entry_safe(pirq_entry, tmp, + &msi_dev_entry->pirq_list_head, list) { + printk(KERN_INFO "PCI: %s: cleaning up MSI-X entry " + "%d\n", pci_name(dev), pirq_entry->entry_nr); + list_del(&pirq_entry->list); + kfree(pirq_entry); + } + spin_unlock_irqrestore(&msi_dev_entry->pirq_list_lock, flags); + + if (msi_dev_entry->mask_base != NULL) { + printk(KERN_INFO "PCI: %s: cleaning up mask_base\n", + pci_name(dev)); + iounmap(msi_dev_entry->mask_base); + msi_dev_entry->mask_base = NULL; + } +} + void pci_no_msi(void) { pci_msi_enable = 0; @@ -906,5 +951,6 @@ EXPORT_SYMBOL(pci_disable_msix); #ifdef CONFIG_XEN EXPORT_SYMBOL(register_msi_get_owner); EXPORT_SYMBOL(unregister_msi_get_owner); +EXPORT_SYMBOL(msi_prune_pci_irq_vectors); #endif diff -r 876a5aaac026 drivers/xen/pciback/pciback_ops.c --- a/drivers/xen/pciback/pciback_ops.c Thu May 26 12:33:41 2011 +0100 +++ b/drivers/xen/pciback/pciback_ops.c Tue May 31 19:17:26 2011 +0200 @@ -29,6 +29,11 @@ void pciback_reset_device(struct pci_dev pci_disable_msix(dev); if (dev->msi_enabled) pci_disable_msi(dev); + + /* After a controlled shutdown or the crash fixup above, prune + * dom0's MSI-X vector list for the device. + */ + msi_prune_pci_irq_vectors(dev); #endif pci_disable_device(dev); diff -r 876a5aaac026 include/linux/pci.h --- a/include/linux/pci.h Thu May 26 12:33:41 2011 +0100 +++ b/include/linux/pci.h Tue May 31 19:17:26 2011 +0200 @@ -640,6 +640,7 @@ extern void msi_remove_pci_irq_vectors(s #ifdef CONFIG_XEN extern int register_msi_get_owner(int (*func)(struct pci_dev *dev)); extern int unregister_msi_get_owner(int (*func)(struct pci_dev *dev)); +extern void msi_prune_pci_irq_vectors(struct pci_dev *dev); #endif #endif _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |