[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling



>>> On 03.12.15 at 09:35, <feng.wu@xxxxxxxxx> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
>      hvm_destroy_all_ioreq_servers(d);
>  }
>  
> +void arch_vcpu_block(struct vcpu *v)
> +{
> +    if ( v->arch.vcpu_block )
> +        v->arch.vcpu_block(v);
> +}

Perhaps better as an inline function? In no case would it (and its
declaration) belong in a HVM-specific file if the hook isn't HVM-specific.
Or if the hook was, then it ought to be put in struct hvm_vcpu.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr, 
> uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
>  
> +/*
> + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be woken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;

I'm pretty sure I complained about this before - this is used only in
this file, and hence should be static.

> +void vmx_pi_per_cpu_init(unsigned int cpu)
> +{
> +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +}
> +
> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;

Is there no locking needed for this check to be safe? Same goes
for vmx_pi_switch_{from,to}() then obviously...

> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);

Stray blank line between comment and corresponding code.

Also the ASSERT() is rather more disconnected from the write
than seems desirable: Wouldn't it be better to cmpxchg() the
whole 32-bit value, validating that SN is clear in the result?

> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));

Considering these are atomic writes, I'm not sure the compiler
will manage to fold the two. Could I therefor talk you into
morphing this into

    write_atomic(&pi_desc->ndst, x2apic_enabled ? ... : ... );

? Indentation will need to be fixed in any case.

> +static void vmx_pi_state_to_normal(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */

Comment style.

> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;

Are this condition and the one in the immediately preceding if()
connected in any way? I.e. could the latter perhaps become an
ASSERT() by connecting the two blocks suitably? I'm simply
trying to limit the number of different cases to consider mentally
when looking at this code ...

> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> +     * removed from the blocking list while we are acquiring the lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )

With you wanting to deal with changes behind your back here,
isn't there come kind of barrier needed between reading and using
pi_block_cpu, such that the compiler won't convert the local
variable accesses into multiple reads of
v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?

> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), 
> flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;

Perhaps (again for readability's sake) better to invert the if()
condition and move these two lines into its body, ...

> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), 
> flags);
> +}

... while this would then be common.

> @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>  
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +    if ( iommu_intpost )
> +        v->arch.vcpu_block = vmx_vcpu_block;

Considering the locking question above - couldn't this be done only
when a device gets added, and the pointer zapped upon removal
of the last device, at once saving the conditional inside the hook?

> @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata 
> vmx_function_table = {
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
>  
> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
> +    LIST_HEAD(list);

Unused local variable?

> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is current
> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +            vcpu_unblock(v);

It not being used elsewhere, I don't see the need for the local
variable v. If you really want to keep it, move its declaration into
this most narrow scope please.

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -802,6 +802,8 @@ void vcpu_block(void)
>  
>      set_bit(_VPF_blocked, &v->pause_flags);
>  
> +    arch_vcpu_block(v);
> +
>      /* Check for events /after/ blocking: avoids wakeup waiting race. */
>      if ( local_events_need_delivery() )
>      {

ISTR some previous discussion on this placement - wasn't it
determined that putting it in the else part here would be
sufficient for your needs and better in terms of overhead?
Or is there some race possible if moved past the invocation
of local_events_need_delivery()?

> @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll)
>      v->poll_evtchn = -1;
>      set_bit(v->vcpu_id, d->poll_mask);
>  
> +    arch_vcpu_block(v);
> +
>  #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
>      /* Check for events /after/ setting flags: avoids wakeup waiting race. */
>      smp_mb();

Same question here then regarding moving this further down.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -160,6 +160,15 @@ struct arch_vmx_struct {
>      struct page_info     *vmwrite_bitmap;
>  
>      struct page_info     *pml_pg;
> +
> +    struct list_head     pi_blocked_vcpu_list;
> +
> +    /*
> +     * Before vCPU is blocked, it is added to the global per-cpu list

global or per-cpu?

> +     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
> +     * event to 'pi_block_cpu' and wakeup the related vCPU.
> +     */

Perhaps to help understanding the comment the part after the
second comma could be a separate sentence?

> +    unsigned int         pi_block_cpu;

Looking at all the use sites I wonder whether it wouldn't make for
easier to read code if you instead stored the lock to acquire. This
would then perhaps also prepare a road for balancing lists growing
too large (i.e. rather than having one list per CPU, you could
then have a variable number of lists, depending on current needs,
and [kind of] freely select one of them).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.