[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
Hi Jan, On 3/10/20 3:19 PM, Jan Beulich wrote: On 09.03.2020 18:47, Paul Durrant wrote:-----Original Message----- From: Jan Beulich <jbeulich@xxxxxxxx> Sent: 09 March 2020 16:29 To: paul@xxxxxxx Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Varad Gautam <vrd@xxxxxxxxx>; Julien Grall <julien@xxxxxxx>; Roger Pau Monné <roger.pau@xxxxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs On 06.03.2020 17:02, paul@xxxxxxx wrote:From: Varad Gautam <vrd@xxxxxxxxx> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. In that scenario, it is possible to receive multiple __pirq_guest_unbind calls for the same pirq from domain_kill, if the pirq has not yet been removed from the domain's pirq_tree, as: domain_kill() -> domain_relinquish_resources() -> pci_release_devices() -> pci_clean_dpci_irq() -> pirq_guest_unbind() -> __pirq_guest_unbind() For a shared pirq (nr_guests > 1), the first call would zap the current domain from the pirq's guests[] list, but the action handler is never freed as there are other guests using this pirq. As a result, on the second call, __pirq_guest_unbind searches for the current domain which has been removed from the guests[] list, and hits a BUG_ON. Make __pirq_guest_unbind safe to be called multiple times by letting xen continue if a shared pirq has already been unbound from this guest. The PIRQ will be cleaned up from the domain's pirq_tree during the destruction in complete_domain_destroy anyway. Signed-off-by: Varad Gautam <vrd@xxxxxxxxx> [taking over from Varad at v4] Signed-off-by: Paul Durrant <paul@xxxxxxx> --- Cc: Jan Beulich <jbeulich@xxxxxxxx> Cc: Julien Grall <julien@xxxxxxx> Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Roger suggested cleaning the entry from the domain pirq_tree so that we need not make it safe to re-call __pirq_guest_unbind(). This seems like a reasonable suggestion but the semantics of the code are almost impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also the name of struct so you generally have little idea what it actally means) so I prefer to stick with a small fix that I can actually reason about. v4: - Re-work the guest array search to make it clearerI.e. there are cosmetic differences to v3 (see below), but technically it's still the same. I can't believe the re-use of "pirq" for different entities is this big of a problem.Please suggest code if you think it ought to be done differentely. I tried.How about this? It's admittedly more code, but imo less ad hoc. I've smoke tested it, but I depend on you or Varad to check that it actually addresses the reported issue. Jan x86/pass-through: avoid double IRQ unbind during domain cleanup I have tested that this patch prevents __pirq_guest_unbind on an already-unbound pirq during the continuation call for domain_kill -ERESTART, by using a modified xen that forces an -ERESTART from pirq_guest_unbind to create the continuation. It fixes the underlying issue. Tested-by: Varad Gautam <vrd@xxxxxxxxx> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. In that scenario, it is possible to receive multiple _pirq_guest_unbind calls for the same pirq from domain_kill, if the pirq has not yet been removed from the domain's pirq_tree, as: domain_kill() -> domain_relinquish_resources() -> pci_release_devices() -> pci_clean_dpci_irq() -> pirq_guest_unbind() -> __pirq_guest_unbind() Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ from the tree being iterated after the first call there. In case such a removed entry still has a softirq outstanding, record it and re-check upon re-invocation. Reported-by: Varad Gautam <vrd@xxxxxxxxx> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> --- unstable.orig/xen/arch/x86/irq.c +++ unstable/xen/arch/x86/irq.c @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p } if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq ) - BUG(); + BUG_ON(!d->is_dying); } /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */ --- unstable.orig/xen/drivers/passthrough/pci.c +++ unstable/xen/drivers/passthrough/pci.c @@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom xfree(digl); } - return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; + radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq); + + if ( !pt_pirq_softirq_active(pirq_dpci) ) + return 0; + + domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci; + + return -ERESTART; } static int pci_clean_dpci_irqs(struct domain *d) @@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do hvm_irq_dpci = domain_get_irq_dpci(d); if ( hvm_irq_dpci != NULL ) { - int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); + int ret = 0; + + if ( hvm_irq_dpci->pending_pirq_dpci ) + { + if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) ) + ret = -ERESTART; + else + hvm_irq_dpci->pending_pirq_dpci = NULL; + } + if ( !ret ) + ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); if ( ret ) { spin_unlock(&d->event_lock); --- unstable.orig/xen/include/asm-x86/hvm/irq.h +++ unstable/xen/include/asm-x86/hvm/irq.h @@ -158,6 +158,8 @@ struct hvm_irq_dpci { DECLARE_BITMAP(isairq_map, NR_ISAIRQS); /* Record of mapped Links */ uint8_t link_cnt[NR_LINK]; + /* Clean up: Entry with a softirq invocation pending / in progress. */ + struct hvm_pirq_dpci *pending_pirq_dpci; }; /* Machine IRQ to guest device/intx mapping. */ Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |