[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: vpci: Need for vpci_cancel_pending
On 28.10.21 13:17, Roger Pau Monné wrote: > On Thu, Oct 28, 2021 at 10:04:20AM +0000, Oleksandr Andrushchenko wrote: >> Hi, all! >> >> While working on PCI passthrough on Arm I stepped onto a crash >> with the following call chain: >> >> pci_physdev_op >> pci_add_device >> init_bars -> modify_bars -> defer_map -> >> raise_softirq(SCHEDULE_SOFTIRQ) >> iommu_add_device <- FAILS >> vpci_remove_device -> xfree(pdev->vpci) >> >> Then: >> leave_hypervisor_to_guest >> vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL >> >> Which results in the crash below: >> >> (XEN) Data Abort Trap. Syndrome=0x6 >> (XEN) Walking Hypervisor VA 0x10 on CPU0 via TTBR 0x00000000481dd000 >> (XEN) 0TH[0x0] = 0x00000000481dcf7f >> (XEN) 1ST[0x0] = 0x00000000481d9f7f >> (XEN) 2ND[0x0] = 0x0000000000000000 >> (XEN) CPU0: Unexpected Trap: Data Abort >> ... >> (XEN) Xen call trace: >> (XEN) [<00000000002246d8>] _spin_lock+0x40/0xa4 (PC) >> (XEN) [<00000000002246c0>] _spin_lock+0x28/0xa4 (LR) >> (XEN) [<000000000024f6d0>] vpci_process_pending+0x78/0x128 >> (XEN) [<000000000027f7e8>] leave_hypervisor_to_guest+0x50/0xcc >> (XEN) [<0000000000269c5c>] entry.o#guest_sync_slowpath+0xa8/0xd4 >> >> So, it seems that if pci_add_device fails and calls vpci_remove_device >> the later needs to cancel any pending work. > Indeed, you will need to check that v->vpci.pdev == pdev before > canceling the pending work though, or else you could be canceling > pending work from a different device. How about: void vpci_cancel_pending(struct pci_dev *pdev) { struct vcpu *v = current; if ( v->vpci.mem && v->vpci.pdev == pdev) { rangeset_destroy(v->vpci.mem); v->vpci.mem = NULL; } } This will effectively prevent the pending work from running > >> If this is a map operation it seems to be straightforward: destroy >> the range set and do not map anything. >> >> If vpci_remove_device is called and unmap operation was scheduled >> then it can be that: >> - guest is being destroyed for any reason and skipping unmap is ok >> as all the mappings for the whole domain will be destroyed anyways >> - guest is still going to stay alive and then unmapping must be done >> >> I would like to hear your thought what would be the right approach >> to take in order to solve the issue. > For the hardware domain it's likely better to do nothing, and just try > to continue execution. The worse that could happen is that MMIO mappings > are left in place when the device has been deassigned. > > For unprivileged domains that get a failure in the middle of a vPCI > {un}map operation we need to destroy them, as we don't know in which > state the p2m is. This can only happen in vpci_process_pending for > domUs I think, as they won't be allowed to call pci_add_device. Please > see the FIXME in vpci_process_pending related to this topic. Agree > > Regards, Roger. > Thank you, Oleksandr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |