[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.13 v4 2/3] x86/passthrough: fix migration of MSI when using posted interrupts



On 13.11.2019 16:59, Roger Pau Monne wrote:
> @@ -5266,6 +5267,36 @@ void hvm_set_segment_register(struct vcpu *v, enum 
> x86_segment seg,
>      alternative_vcall(hvm_funcs.set_segment_register, v, seg, reg);
>  }
>  
> +int hvm_intr_get_dests(struct domain *d, uint8_t dest, uint8_t dest_mode,

While for IO-APIC and MSI "dest" can't be wider than 8 bits without
virtual interrupt remapping, "intr" imo is generic enough to also
(potentially) include IPIs. I'd therefore recommend in new code to
make this uint32_t right away to cover for all current and future
cases.

Also - const for d?

> +                       uint8_t delivery_mode, unsigned long *vcpus)
> +{
> +    struct vcpu *v;
> +
> +    switch ( delivery_mode )
> +    {
> +    case dest_LowestPrio:
> +        /*
> +         * Get all the possible destinations, but note that lowest priority
> +         * mode is only going to inject the interrupt to the vCPU running at
> +         * the least privilege level.

"privilege level"? I think you mean "lowest priority" here?

> +         *
> +         * Fallthrough
> +         */
> +    case dest_Fixed:

There's not really any fall through here, and hence I think this part
of the comment would better be dropped.

> +        for_each_vcpu ( d, v )
> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) 
> )
> +                __set_bit(v->vcpu_id, vcpus);
> +        break;
> +
> +    default:
> +        gprintk(XENLOG_WARNING, "unsupported interrupt delivery mode %u\n",
> +                delivery_mode);

gdprintk() perhaps, so keep at least release builds' logs tidy
(and useful)?

> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -112,6 +112,25 @@ static void sync_pir_to_irr(struct vcpu *v)
>          alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
>  }
>  
> +void domain_sync_vlapic_pir(struct domain *d, unsigned long *vcpus)

const (twice)?

> +{
> +    unsigned int id;
> +
> +    if ( !bitmap_weight(vcpus, d->max_vcpus) )
> +        return;

Why go over the entire bitmap en extra time here? The find-first
will do exactly the same, and hence the loop body below won't be
entered in this case.

> +    for ( id = find_first_bit(vcpus, d->max_vcpus);
> +          id < d->max_vcpus;
> +          id = find_next_bit(vcpus, d->max_vcpus, id + 1) )
> +    {
> +        if ( d->vcpu[id] != current )
> +            vcpu_pause(d->vcpu[id]);

Isn't this setting us up for a deadlock if two parties come here
for the same domain, and both on a vCPU belonging to that domain
(and with the opposite one's bit set in the bitmap)? But it looks
like d would never be the current domain here - you will want to
assert and comment on this, though. At that point the comparisons
against current can then go away as well afaict.

> @@ -345,6 +289,8 @@ int pt_irq_create_bind(
>          const struct vcpu *vcpu;
>          uint32_t gflags = pt_irq_bind->u.msi.gflags &
>                            ~XEN_DOMCTL_VMSI_X86_UNMASKED;
> +        DECLARE_BITMAP(dest_vcpus, MAX_VIRT_CPUS) = { };
> +        DECLARE_BITMAP(prev_vcpus, MAX_VIRT_CPUS) = { };

This is reachable for HVM domains only, isn't it? In which case
why the much larger MAX_VIRT_CPUS (creating two unreasonably big
local variables) instead of HVM_MAX_VCPUS? However, even once
switched I'd be opposed to this - There'd be a fair chance that
the need to deal with these variables might go unnoticed once
the maximum vCPU count for HVM gets increased (which has been a
pending todo item for many years now).

> @@ -420,20 +384,16 @@ int pt_irq_create_bind(
>          delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags,
>                                    XEN_DOMCTL_VMSI_X86_DELIV_MASK);
>  
> -        dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> +        hvm_intr_get_dests(d, dest, dest_mode, delivery_mode, dest_vcpus);
> +        dest_vcpu_id = bitmap_weight(dest_vcpus, d->max_vcpus) != 1 ?
> +            -1 : find_first_bit(dest_vcpus, d->max_vcpus);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>          spin_unlock(&d->event_lock);
>  
>          pirq_dpci->gmsi.posted = false;
>          vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
> -        if ( iommu_intpost )
> -        {
> -            if ( delivery_mode == dest_LowestPrio )
> -                vcpu = vector_hashing_dest(d, dest, dest_mode,
> -                                           pirq_dpci->gmsi.gvec);
> -            if ( vcpu )
> -                pirq_dpci->gmsi.posted = true;
> -        }
> +        if ( vcpu && iommu_intpost )
> +            pirq_dpci->gmsi.posted = true;

One aspect that I'm curious about: How much posting opportunity do
we lose in practice by no longer posting when the guest uses lowest
priority mode with multiple destinations?

> @@ -442,6 +402,9 @@ int pt_irq_create_bind(
>              pi_update_irte(vcpu ? &vcpu->arch.hvm.vmx.pi_desc : NULL,
>                             info, pirq_dpci->gmsi.gvec);
>  
> +        if ( hvm_funcs.deliver_posted_intr )
> +            domain_sync_vlapic_pir(d, prev_vcpus);

Accessing hvm_funcs here looks like a layering violation. This
wants either moving into the function or (seeing the other use)
abstracting away. Seeing the conditional here (and below) I also
notice that you calculate prev_vcpus in vein when there's no
interrupt posting in use.

I guess together with the variable size issue mentioned above a
possible solution would be:
- have one bitmap hanging off of pirq_dpci->gmsi,
- have one bitmap per pCPU,
- populate the new destination bits into the per-pCPU one,
- issue the PIR->IRR sync,
- exchange the per-pCPU and per-DPCI pointers.
You could then leave the pointers at NULL when no posting is to
be used, addressing the apparent layering violation here at the
same time.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.