[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling



>>> On 08.05.15 at 11:07, <feng.wu@xxxxxxxxx> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1711,6 +1711,131 @@ static void vmx_handle_eoi(u8 vector)
>      __vmwrite(GUEST_INTR_STATUS, status);
>  }
>  
> +static void vmx_pi_desc_update(struct vcpu *v, int old_state)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    struct pi_desc old, new;
> +    unsigned long flags;
> +
> +    if ( !iommu_intpost )
> +        return;

Considering how the adjustment to start_vmx(), this at best should
be an ASSERT() (if a check is needed at all).

> +    switch ( v->runstate.state )
> +    {
> +    case RUNSTATE_runnable:
> +    case RUNSTATE_offline:
> +        /*
> +         * We don't need to send notification event to a non-running
> +         * vcpu, the interrupt information will be delivered to it before
> +         * VM-ENTRY when the vcpu is scheduled to run next time.
> +         */
> +        pi_desc->sn = 1;
> +
> +        /*
> +         * If the state is transferred from RUNSTATE_blocked,
> +         * we should set 'NV' feild back to posted_intr_vector,
> +         * so the Posted-Interrupts can be delivered to the vCPU
> +         * by VT-d HW after it is scheduled to run.
> +         */
> +        if ( old_state == RUNSTATE_blocked )
> +        {
> +            do
> +            {
> +                old.control = new.control = pi_desc->control;
> +                new.nv = posted_intr_vector;
> +            }
> +            while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                    != old.control );

So why is it okay to do the SN update non-atomically when the
vector update is done atomically?

Also the curly braces do _not_ go on separate lines for do/while
constructs.

And then do you really need to atomically update the whole 64 bits
here, rather than just the nv field? By not making it a bit field
you could perhaps just write_atomic() it?

> +
> +           /*
> +            * Delete the vCPU from the related block list
> +            * if we are resuming from blocked state
> +            */
> +           spin_lock_irqsave(&per_cpu(blocked_vcpu_lock,
> +                             v->pre_pcpu), flags);
> +           list_del(&v->blocked_vcpu_list);
> +           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_lock,
> +                                  v->pre_pcpu), flags);
> +        }
> +        break;
> +
> +    case RUNSTATE_blocked:
> +        /*
> +         * The vCPU is blocked on the block list.
> +         * Add the blocked vCPU on the list of the
> +         * vcpu->pre_pcpu, which is the destination
> +         * of the wake-up notification event.
> +         */
> +        v->pre_pcpu = v->processor;

Is latching this upon runstate change really enough? I.e. what about
the v->processor changes that sched_move_domain() or individual
schedulers do? Or if it really just matters on which CPU's blocked list
the vCPU is (while its ->processor changing subsequently doesn't
matter) I'd like to see the field named more after its purpose (e.g.
pi_block_cpu; list and lock should btw also have a connection to PI
in their names).

In the end, if the placement on a list followed v->processor, you
would likely get away without the extra new field. Are there
synchronization constraints speaking against such a model?

> +        spin_lock_irqsave(&per_cpu(blocked_vcpu_lock,
> +                          v->pre_pcpu), flags);
> +        list_add_tail(&v->blocked_vcpu_list,
> +                      &per_cpu(blocked_vcpu, v->pre_pcpu));
> +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_lock,
> +                               v->pre_pcpu), flags);
> +
> +        do
> +        {
> +            old.control = new.control = pi_desc->control;
> +
> +            /*
> +             * We should not block the vCPU if
> +             * an interrupt was posted for it.
> +             */
> +
> +            if ( old.on == 1 )

I think I said so elsewhere, but just in case I didn't: Please don't
compare boolean fields with 1 - e.g. in the case here "if ( old.on )"
suffices.

> +            {
> +                /*
> +                 * The vCPU will be removed from the block list
> +                 * during its state transferring from RUNSTATE_blocked
> +                 * to RUNSTATE_runnable after the following tasklet
> +                 * is scheduled to run.
> +                 */

Precise comments please - I don't think the scheduling of the
tasklet makes any difference; I suppose it's the execution of it
that does.

> +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> +                return;
> +            }
> +
> +            /*
> +             * Change the 'NDST' field to v->pre_pcpu, so when
> +             * external interrupts from assigned deivces happen,
> +             * wakeup notifiction event will go to v->pre_pcpu,
> +             * then in pi_wakeup_interrupt() we can find the
> +             * vCPU in the right list to wake up.
> +             */
> +            if ( x2apic_enabled )
> +                new.ndst = cpu_physical_id(v->pre_pcpu);
> +            else
> +                new.ndst = MASK_INSR(cpu_physical_id(v->pre_pcpu),
> +                                     PI_xAPIC_NDST_MASK);
> +            new.sn = 0;
> +            new.nv = pi_wakeup_vector;
> +        }
> +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                != old.control );
> +        break;
> +
> +    case RUNSTATE_running:
> +        ASSERT( pi_desc->sn == 1 );
> +
> +        do
> +        {
> +            old.control = new.control = pi_desc->control;
> +            if ( x2apic_enabled )
> +                new.ndst = cpu_physical_id(v->processor);
> +            else
> +                new.ndst = (cpu_physical_id(v->processor) << 8) & 0xFF00;

Why are you not using MASK_INSR() here like you do above?

> +
> +            new.sn = 0;
> +        }
> +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                != old.control );

Same here - can't this be a write_atomic() of ndst and a clear_bit()
of sn instead of a loop over cmpxchg()?

> +        break;
> +
> +    default:
> +        break;

This seems dangerous: I think you want at least an
ASSERT_UNREACHABLE() here.

> @@ -1842,7 +1967,12 @@ const struct hvm_function_table * __init 
> start_vmx(void)
>          alloc_direct_apic_vector(&posted_intr_vector, 
> pi_notification_interrupt);
>  
>          if ( iommu_intpost )
> +        {
>              alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> +            vmx_function_table.pi_desc_update = vmx_pi_desc_update;
> +        }
> +        else
> +            vmx_function_table.pi_desc_update = NULL;

Isn't that field NULL anyway?

> @@ -157,7 +158,11 @@ static inline void vcpu_runstate_change(
>          v->runstate.state_entry_time = new_entry_time;
>      }
>  
> +    old_state = v->runstate.state;
>      v->runstate.state = new_state;
> +
> +    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
> +        hvm_funcs.pi_desc_update(v, old_state);

I don't see how this would build on ARM.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -142,6 +142,8 @@ struct vcpu
>  
>      int              processor;
>  
> +    int              pre_pcpu;

This again doesn't belong into the common structure (and should
be accompanied by a comment, and should be unsigned int).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.