Xen project Mailing List

Re: [Xen-devel] [PATCH for-4.13 v4 2/3] x86/passthrough: fix migration of MSI when using posted interrupts

To: Roger Pau Monne <roger.pau@xxxxxxxxxx>

Date: Thu, 14 Nov 2019 14:35:56 +0100

Cc: Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Joe Jin <joe.jin@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Thu, 14 Nov 2019 13:35:44 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.11.2019 16:59, Roger Pau Monne wrote: > @@ -5266,6 +5267,36 @@ void hvm_set_segment_register(struct vcpu *v, enum > x86_segment seg, > alternative_vcall(hvm_funcs.set_segment_register, v, seg, reg); > } > > +int hvm_intr_get_dests(struct domain *d, uint8_t dest, uint8_t dest_mode, While for IO-APIC and MSI "dest" can't be wider than 8 bits without virtual interrupt remapping, "intr" imo is generic enough to also (potentially) include IPIs. I'd therefore recommend in new code to make this uint32_t right away to cover for all current and future cases. Also - const for d? > + uint8_t delivery_mode, unsigned long *vcpus) > +{ > + struct vcpu *v; > + > + switch ( delivery_mode ) > + { > + case dest_LowestPrio: > + /* > + * Get all the possible destinations, but note that lowest priority > + * mode is only going to inject the interrupt to the vCPU running at > + * the least privilege level. "privilege level"? I think you mean "lowest priority" here? > + * > + * Fallthrough > + */ > + case dest_Fixed: There's not really any fall through here, and hence I think this part of the comment would better be dropped. > + for_each_vcpu ( d, v ) > + if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) > ) > + __set_bit(v->vcpu_id, vcpus); > + break; > + > + default: > + gprintk(XENLOG_WARNING, "unsupported interrupt delivery mode %u\n", > + delivery_mode); gdprintk() perhaps, so keep at least release builds' logs tidy (and useful)? > --- a/xen/arch/x86/hvm/vlapic.c > +++ b/xen/arch/x86/hvm/vlapic.c > @@ -112,6 +112,25 @@ static void sync_pir_to_irr(struct vcpu *v) > alternative_vcall(hvm_funcs.sync_pir_to_irr, v); > } > > +void domain_sync_vlapic_pir(struct domain *d, unsigned long *vcpus) const (twice)? > +{ > + unsigned int id; > + > + if ( !bitmap_weight(vcpus, d->max_vcpus) ) > + return; Why go over the entire bitmap en extra time here? The find-first will do exactly the same, and hence the loop body below won't be entered in this case. > + for ( id = find_first_bit(vcpus, d->max_vcpus); > + id < d->max_vcpus; > + id = find_next_bit(vcpus, d->max_vcpus, id + 1) ) > + { > + if ( d->vcpu[id] != current ) > + vcpu_pause(d->vcpu[id]); Isn't this setting us up for a deadlock if two parties come here for the same domain, and both on a vCPU belonging to that domain (and with the opposite one's bit set in the bitmap)? But it looks like d would never be the current domain here - you will want to assert and comment on this, though. At that point the comparisons against current can then go away as well afaict. > @@ -345,6 +289,8 @@ int pt_irq_create_bind( > const struct vcpu *vcpu; > uint32_t gflags = pt_irq_bind->u.msi.gflags & > ~XEN_DOMCTL_VMSI_X86_UNMASKED; > + DECLARE_BITMAP(dest_vcpus, MAX_VIRT_CPUS) = { }; > + DECLARE_BITMAP(prev_vcpus, MAX_VIRT_CPUS) = { }; This is reachable for HVM domains only, isn't it? In which case why the much larger MAX_VIRT_CPUS (creating two unreasonably big local variables) instead of HVM_MAX_VCPUS? However, even once switched I'd be opposed to this - There'd be a fair chance that the need to deal with these variables might go unnoticed once the maximum vCPU count for HVM gets increased (which has been a pending todo item for many years now). > @@ -420,20 +384,16 @@ int pt_irq_create_bind( > delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags, > XEN_DOMCTL_VMSI_X86_DELIV_MASK); > > - dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode); > + hvm_intr_get_dests(d, dest, dest_mode, delivery_mode, dest_vcpus); > + dest_vcpu_id = bitmap_weight(dest_vcpus, d->max_vcpus) != 1 ? > + -1 : find_first_bit(dest_vcpus, d->max_vcpus); > pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id; > spin_unlock(&d->event_lock); > > pirq_dpci->gmsi.posted = false; > vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL; > - if ( iommu_intpost ) > - { > - if ( delivery_mode == dest_LowestPrio ) > - vcpu = vector_hashing_dest(d, dest, dest_mode, > - pirq_dpci->gmsi.gvec); > - if ( vcpu ) > - pirq_dpci->gmsi.posted = true; > - } > + if ( vcpu && iommu_intpost ) > + pirq_dpci->gmsi.posted = true; One aspect that I'm curious about: How much posting opportunity do we lose in practice by no longer posting when the guest uses lowest priority mode with multiple destinations? > @@ -442,6 +402,9 @@ int pt_irq_create_bind( > pi_update_irte(vcpu ? &vcpu->arch.hvm.vmx.pi_desc : NULL, > info, pirq_dpci->gmsi.gvec); > > + if ( hvm_funcs.deliver_posted_intr ) > + domain_sync_vlapic_pir(d, prev_vcpus); Accessing hvm_funcs here looks like a layering violation. This wants either moving into the function or (seeing the other use) abstracting away. Seeing the conditional here (and below) I also notice that you calculate prev_vcpus in vein when there's no interrupt posting in use. I guess together with the variable size issue mentioned above a possible solution would be: - have one bitmap hanging off of pirq_dpci->gmsi, - have one bitmap per pCPU, - populate the new destination bits into the per-pCPU one, - issue the PIR->IRR sync, - exchange the per-pCPU and per-DPCI pointers. You could then leave the pointers at NULL when no posting is to be used, addressing the apparent layering violation here at the same time. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.