[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/6] Pause/Unpause the domain before/after assigning PI hooks



>>> On 14.09.16 at 04:23, <feng.wu@xxxxxxxxx> wrote:
>> From: Wu, Feng
>> Sent: Monday, September 5, 2016 11:12 AM
>> I still have trouble to fully understand this, please see the following
>> scenario:
>> 
>> 1) Set 'NDST' to the pCPU0 (which vCPU is currently running on, but
>> it may changes since vCPU scheduling happens behind us, so how to
>> decide which pCPU for 'NDST'?)
>> 2) Install all four hooks and vCPU is re-scheduled before
>> 'vmx_pi_switch_to()' gets installed, so the pCPU of the vCPU might
>> be changed to pCPU1 without 'NDST' gets updated.
>> 4) vmx_vcpu_block() gets called and we hit the ASSERT()
>> 
>> Maybe I missed something, It would be appreciated if you can
>> correct me if my understanding is wrong.
> 
> My email system had some problems, hence some emails didn't go
> to my index, but I found your replay from the Internet as below:
> 
> " Well, the problem is that you keep trying to find issues with the
> (simplified) sequence of things I'm proposing to investigate as
> an alternative. The above scenario _again_ can be dealt with
> without having to pause the guest: Install the context switch
> hook(s) first, then set NDST, then set the block hook. So what
> I'd really like to ask you to do is first try to find a model without
> pausing, and only if you figure that's impossible, then explain
> why and we'll go with the pausing approach.
> 
> Jan"
> 
> Then I tried to implement the function like the following:
> 
> /* This function is called when pcidevs_lock is held */
> void vmx_pi_hooks_assign(struct domain *d)
> {
>     if ( !iommu_intpost || !has_hvm_container_domain(d) )
>         return;
> 
>     ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
> 
>     /*
>      * We carefully handle the timing here:
>      * - Install the context switch first
>      * - Then set the NDST field
>      * - Install the block and resume hooks in the end
>      *
>      * This can make sure the PI (especially the NDST feild) is
>      * in proper state when we call vmx_vcpu_block().
>      */
>     d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
>     d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> 
>     for_each_vcpu ( d, v )
>     {
>         unsigned int dest = cpu_physical_id(v->processor);
>         struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> 
>         /* spot 1 */
> 
>         write_atomic(&pi_desc->ndst,
>                      x2apic_enabled ? dest : MASK_INSR(dest, 
> PI_xAPIC_NDST_MASK));
>     }
> 
>     d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
>     d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> }
> 
> However, I still think it is problematic. Consider the _spot 1_ above, we get
> the pCPU of the vCPU and update the NDST, but the vCPU can be happily
> rescheduled to another pCPU before updating the NDST. The problem is
> that it is not safe to update NDST here, since vCPU can be scheduled
> behind us at any time. And if this is the case, it is hard to safely do this
> without guarantee the vCPU is in a known state. Any further advice
> on this? Thanks a lot!

Well, yes, there still is a problem, but (sadly) also yes, you continue
to follow the pattern I mentioned in my earlier reply: Instead of trying
to find a suitable ordering of things, you simply complain about the
(necessarily simplified) suggestion I gave. For example, is
v->arch.hvm_vmx.pi_desc perhaps in a known never-touched state
upon entering vmx_pi_hooks_assign() the very first time? In that case
instead of the write_atomic() you use in the code above you could
cmpxchg() against that known initial value, avoiding the write here if
one has already happened in vmx_pi_switch_to(). (And before you
ask, please also consider writing a known-invalid NDST before assigning
the context switch hooks, and cmpxchg() against that one, in case
the initial state isn't sufficiently distinguishable. Using an invalid value
here shouldn't matter as no notifications should arrive prior to any
device getting assigned to the guest.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.