[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 1/2] vmx: VT-d posted-interrupt core logic handling



>>> On 29.01.16 at 02:53, <feng.wu@xxxxxxxxx> wrote:
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: Friday, January 29, 2016 12:38 AM
>> >>> On 28.01.16 at 06:12, <feng.wu@xxxxxxxxx> wrote:
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -83,7 +83,140 @@ static int vmx_msr_write_intercept(unsigned int msr,
>> uint64_t msr_content);
>> >  static void vmx_invlpg_intercept(unsigned long vaddr);
>> >  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
>> >
>> > +static void vmx_vcpu_block(struct vcpu *v)
>> > +{
>> > +    unsigned long flags;
>> > +    unsigned int dest;
>> > +
>> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> 
>> Stray blank line between declarations.
>> 
>> > +    ASSERT(v->arch.hvm_vmx.pi_block_list_lock == NULL);
>> 
>> I think this needs to be done after acquiring the lock.
> 
> I am not sure what you mean here. The ASSERT here means that
> the vCPU is not on any pCPU list when we get here.

My point is that until you've taken the lock, whether the vCPU is on
any such list may change. All state updates are done under lock, so
checking for valid incoming state should be done under lock too, to
be most useful.

>> > +static void vmx_pi_switch_from(struct vcpu *v)
>> > +{
>> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> > +
>> > +    if ( test_bit(_VPF_blocked, &v->pause_flags) )
>> > +        return;
>> > +
>> > +    /*
>> > +     * The vCPU has been preempted or went to sleep. We don't
>> > +     * need to send notification event to a runnable or sleeping
>> > +     * vcpu, the interrupt information will be delivered to it
>> > +     * before VM-ENTRY when the vcpu is scheduled to run next time.
>> > +     */
>> > +    pi_set_sn(pi_desc);
>> > +}
>> 
>> The comment confuses me: A runnable vCPU certainly doesn't
>> need waking, but a sleeping one? Why wouldn't an interrupt
>> coming in not need to wake that vCPU, if this interrupt is what
>> it's waiting for?
> 
> First, let's put VT-d PI away, just thinking the case without it:
> When a vCPU is sleeping, such as, vcpu_sleep_nosync() gets
> called, the vCPU is in RUNSTATE_offline state. Then an interrupt
> comes in and the slept vCPU is the destination, the vCPU is not
> woken up by the interrupts. The slept vCPU would only be woken
> up by calling vcpu_wakeup() in the code path responding to
> the interrupt, however, this is not in the code. So this is
> the current implementation and policy. I don't think VT-d PI needs
> to break it. On the other hand, even we clear SN for slept vCPU,
> it doesn't help, since it is still not in the run-queue and don't
> have chance to run.

Okay, maybe this is just a matter of mixed up terminology (and
not even you mixing it up): "sleeping" to me means waiting for
some kind of event (e.g. resulting from SCHEDOP_block),
whereas "offline" means waiting to be onlined again. But indeed
I agree that vcpu_pause*() calling vcpu_sleep_*sync() makes
the wording above in line with internal uses. Just that the use
of vcpu_sleep_*sync() along with setting various bits in
->pause_{flags,count} is, well, not ideal: Either the fields should
have "sleep" in their names or the operation should have "pause"
in its name. Just that there already is a vcpu_pause_nosync().

So maybe replacing "went to sleep" by "has been paused" would
already clarify things enough. Or alternatively refer to the two
functions, e.g. "has been put to sleep by vcpu_sleep_{,no}sync()".

>> > +static void vmx_pi_do_resume(struct vcpu *v)
>> > +{
>> > +    unsigned long flags;
>> > +    spinlock_t *pi_block_list_lock;
>> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> > +
>> > +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
>> > +
>> > +    /*
>> > +     * Set 'NV' field back to posted_intr_vector, so the
>> > +     * Posted-Interrupts can be delivered to the vCPU when
>> > +     * it is running in non-root mode.
>> > +     */
>> > +    if ( pi_desc->nv != posted_intr_vector )
>> > +        write_atomic(&pi_desc->nv, posted_intr_vector);
>> 
>> Perhaps this was discussed before, but I don't recall and now
>> wonder - why inside an if()? This is a simple memory write on
>> x86.
> 
> The initial purpose is that if NV is already equal to 'posted_intr_vector',
> we can save the following atomically write operation. There are some
> volatile stuff and barriers in write_atomic().

But what does the final generated code look like? As I said, I'm
sure it's just a single MOV. And putting a conditional around it will
likely make things slower rather than faster.

>> > --- a/xen/drivers/passthrough/vtd/iommu.c
>> > +++ b/xen/drivers/passthrough/vtd/iommu.c
>> > @@ -2293,6 +2293,8 @@ static int reassign_device_ownership(
>> >          pdev->domain = target;
>> >      }
>> >
>> > +    vmx_pi_hooks_reassign(source, target);
>> > +
>> >      return ret;
>> >  }
>> 
>> I think you need to clear source's hooks here, but target's need to
>> get set ahead of the actual assignment.
> 
> I think this is the place where the device is moved from
> &source->arch.pdev_list to &target->arch.pdev_list, if that is the
> case, we should clear source and set target here, right?

No - target needs to be ready to deal with events from the device
_before_ the device actually gets assigned to it.

> Please refer to the following code before 'vmx_pi_hooks_reassign'
> 
>     if ( devfn == pdev->devfn )
>     {
>         list_move(&pdev->domain_list, &target->arch.pdev_list);
>         pdev->domain = target;
>     }

Sure, but that's unrelated to the issue I'm trying to point out.

>> > --- a/xen/include/asm-x86/hvm/hvm.h
>> > +++ b/xen/include/asm-x86/hvm/hvm.h
>> > @@ -565,6 +565,11 @@ const char *hvm_efer_valid(const struct vcpu *v,
>> uint64_t value,
>> >                             signed int cr0_pg);
>> >  unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t
>> restore);
>> >
>> > +#define arch_vcpu_block(v) ({                                             
>> >   \
>> > +    if ( (v)->domain->arch.hvm_domain.vmx.vcpu_block )                    
>> >   \
>> > +        (v)->domain->arch.hvm_domain.vmx.vcpu_block((v));                 
>> >   \
>> 
>> Stray double parentheses. But - why is this a macro all of the
>> sudden anyway?
> 
> In the previous version, you suggest to make arch_vcpu_block()
> inline, I tried that, but I encountered building failures, which is
> related to using '(v)->domain->arch.hvm_domain.vmx.vcpu_block '
> here since it refers to some data structure, it is not so straightforward
> to address it, so I change it to a macro, just like other micros in this
> file, which refers to ' (v)->arch.hvm_vcpu.....'.

Well, okay then for the macro part. We should address this, but
this certainly is beyond the scope of you will want to do.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.