[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling




> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
> Sent: Friday, July 10, 2015 7:06 PM
> To: Jan Beulich
> Cc: Wu, Feng; andrew.cooper3@xxxxxxxxxx; George Dunlap; Tian, Kevin; Zhang,
> Yang Z; xen-devel; keir@xxxxxxx
> Subject: Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor
> during vCPU scheduling
> 
> On Fri, 2015-07-10 at 07:22 +0100, Jan Beulich wrote:
> > >>> On 10.07.15 at 07:59, <feng.wu@xxxxxxxxx> wrote:
> > > If you agree with doing all this in a central place, maybe we can create
> > > an arch hook for 'struct scheduler' to do this and call it in all the 
> > > places
> > > vcpu_runstate_change() gets called. What is your opinion about this?
> >
> > Doing this in a central place is certainly the right approach, but
> > adding an arch hook that needs to be called everywhere
> > vcpu_runstate_change() wouldn't serve that purpose.
> >
> Indeed.
> 
> > Instead
> > we'd need to replace all current vcpu_runstate_change() calls
> > with calls to a new function calling both this and the to be added
> > arch hook.
> >
> Well, I also see the value of having this done in one place, but not to
> the point of adding something like this.
> 
> > But please wait for George's / Dario's feedback, because they
> > seem to be even less convinced than me about your model of
> > tying the updates to runstate changes.
> >
> Indeed. George stated very well the reason why vcpu_runstate_change()
> should not be used, and suggested arch hooks to be added in the relevant
> places. I particularly like this idea as, not only it would leave
> vcpu_runstate_change() alone, but it would also help disentangling this
> from runstates, which, IMO, is also important.
> 
> So, can we identify the state (runstate? :-/) transitions that needs
> intercepting, and find a suitable place where to place hooks? I mean,
> something like this:
> 
>  - running-->blocked: can be handled in the arch specific part of
>                       context switch (similarly to CMT/CAT, which
>                       already hooks into there). So, in this case, no
>                       need to add any hook, as arch specific code is
>                       called already;
> 
>  - running-->runnable: same as above;
> 
>  - running-->offline: not sure if you need to take action on this. If
>                       yes, context switch should be fine as well;
> 
>  - blocked-->runnable: I think we need this, don't we? If yes, we
>                        probably want an arch hook in vcpu_wake();
> 
>  - blocked-->offline: do you need it? Well, the hook in wake should work
>                       for this as well, if yes;
> 
>  - runnable/running-->offline: if necessary, we want an hook in
>                                vcpu_sleep_nosync().
> 
> Another way to look at this, less biased toward runstates (i.e., what
> I've been asking for since a while), would be:
> 
>  - do you need to perform an action upon context switch (on prev and/or
>    next vcpu)? If yes, there's an arch specific path in there already;
>  - do you need to perform an action when a vcpu wakes-up? If yes, we
>    need an arch hook in vcpu_wake();
>  - do you need to perform an action when a vcpu goes to sleep? If yes,
>    we need an arch hook in vcpu_sleep_nosync();
> 
> I think this makes a more than fair solution. I happen to like it even
> better than the centralized approach, actually! That is for personal
> taste, but also because I think it may be useful for others too, in
> future, to be able to execute arch specific code, e.g., upon wakes-up,
> in which case we will be able to use the hook that we're introducing
> here for PI.
> 
> Thanks and Regards,
> Dario

Hi Dario,

Thanks for the suggestion! I made a draft patch for this idea, It may have
some issues since It is just a draft version, kind of like prototype, I post
it here just like to know whether it is meet your expectation, if it is I
can continue with this direction and this may speed up the upstreaming
process. Thanks a lot!

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 6eebc1a..7e678c8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -740,6 +740,81 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
     vmx_save_dr(v);
+
+    if ( iommu_intpost )
+    {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+        struct pi_desc old, new;
+        unsigned long flags;
+
+        if ( vcpu_runnable(v) || !test_bit(_VPF_blocked, &v->pause_flags) )
+        {
+            /*
+             * The vCPU is preempted or sleeped. We don't need to send
+             * notification event to a non-running vcpu, the interrupt
+             * information will be delivered to it before VM-ENTRY when
+             * the vcpu is scheduled to run next time.
+             */
+            pi_set_sn(pi_desc);
+
+        }
+        else if ( test_bit(_VPF_blocked, &v->pause_flags) )
+        {
+            /* The vCPU is blocked */
+            ASSERT(v->arch.hvm_vmx.pi_block_cpu == -1);
+
+            /*
+             * The vCPU is blocked on the block list. Add the blocked
+             * vCPU on the list of the v->arch.hvm_vmx.pi_block_cpu,
+             * which is the destination of the wake-up notification event.
+             */
+            v->arch.hvm_vmx.pi_block_cpu = v->processor;
+            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
+                              v->arch.hvm_vmx.pi_block_cpu), flags);
+            list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                          &per_cpu(pi_blocked_vcpu, 
v->arch.hvm_vmx.pi_block_cpu));
+            spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
+                               v->arch.hvm_vmx.pi_block_cpu), flags);
+
+            do {
+                old.control = new.control = pi_desc->control;
+
+                /*
+                 * We should not block the vCPU if
+                 * an interrupt was posted for it.
+                 */
+
+                if ( old.on )
+                {
+                    /*
+                     * The vCPU will be removed from the block list
+                     * during its state transferring from RUNSTATE_blocked
+                     * to RUNSTATE_runnable after the following tasklet
+                     * is executed.
+                     */
+                    tasklet_schedule(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
+                    return;
+                }
+
+                /*
+                 * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
+                 * so when external interrupts from assigned deivces happen,
+                 * wakeup notifiction event will go to
+                 * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
+                 * we can find the vCPU in the right list to wake up.
+                 */
+                if ( x2apic_enabled )
+                    new.ndst = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
+                else
+                    new.ndst = MASK_INSR(cpu_physical_id(
+                                     v->arch.hvm_vmx.pi_block_cpu),
+                                     PI_xAPIC_NDST_MASK);
+                new.sn = 0;
+                new.nv = pi_wakeup_vector;
+            } while ( cmpxchg(&pi_desc->control, old.control, new.control)
+                      != old.control );
+        }
+    }
 }

 static void vmx_ctxt_switch_to(struct vcpu *v)
@@ -764,6 +839,22 @@ static void vmx_ctxt_switch_to(struct vcpu *v)

     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
+
+    if ( iommu_intpost )
+    {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+        ASSERT( pi_desc->sn == 1 );
+
+        if ( x2apic_enabled )
+            write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
+        else
+            write_atomic(&pi_desc->ndst,
+                         MASK_INSR(cpu_physical_id(v->processor),
+                         PI_xAPIC_NDST_MASK));
+
+        pi_clear_sn(pi_desc);
+    }
 }


@@ -1900,6 +1991,42 @@ static void vmx_pi_desc_update(struct vcpu *v, int 
old_state)
     }
 } 
+void arch_vcpu_wake(struct vcpu *v)
+{
+    if ( !iommu_intpost || (v->runstate.state != RUNSTATE_blocked) )
+        return;
+
+    if ( likely(vcpu_runnable(v)) ||
+         !test_bit(_VPF_blocked, &v->pause_flags) )
+    {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+        unsigned long flags;
+
+        /*
+         * blocked -> runnable/offline
+         * If the state is transferred from RUNSTATE_blocked,
+         * we should set 'NV' feild back to posted_intr_vector,
+         * so the Posted-Interrupts can be delivered to the vCPU
+         * by VT-d HW after it is scheduled to run.
+         */
+        write_atomic((uint8_t*)&pi_desc->nv, posted_intr_vector);
+
+        /*
+         * Delete the vCPU from the related block list
+         * if we are resuming from blocked state
+         */
+        if (v->arch.hvm_vmx.pi_block_cpu != -1)
+        {
+            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
+                              v->arch.hvm_vmx.pi_block_cpu), flags);
+            list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+            spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
+                                   v->arch.hvm_vmx.pi_block_cpu), flags);
+            v->arch.hvm_vmx.pi_block_cpu = -1;
+        }
+    }
+}
+
 void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
                                uint32_t *eax, uint32_t *ebx,
                                uint32_t *ecx, uint32_t *edx)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 20727d6..7b08797 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -397,6 +397,8 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }

+    arch_vcpu_wake(v);
+
     vcpu_schedule_unlock_irqrestore(lock, flags, v);

     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 9603cf0..be5aebf 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -266,6 +266,7 @@ static inline unsigned int domain_max_vcpus(const struct 
domain *d)
 }

 static void arch_pi_desc_update(struct vcpu *v, int old_state) {}
+static void arch_vcpu_wake(struct vcpu *v) {}

 #endif /* __ASM_DOMAIN_H__ */

diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index e175417..38c796c 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -511,6 +511,7 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);

 void arch_pi_desc_update(struct vcpu *v, int old_state);
+void arch_vcpu_wake(struct vcpu *v);

 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */



> 
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.