Xen project Mailing List

Re: [Xen-devel] [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling

To: Feng Wu <feng.wu@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Tue, 10 Nov 2015 19:14:51 +0100

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Tue, 10 Nov 2015 18:15:03 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, 2015-11-03 at 16:43 +0800, Feng Wu wrote: > This patch includes the following aspects: > - Handling logic when vCPU is blocked: > * Add a global vector to wake up the blocked vCPU > when an interrupt is being posted to it (This part > was sugguested by Yang Zhang <yang.z.zhang@xxxxxxxxx>). > * Define two per-cpu variables: > 1. pi_blocked_vcpu: > A list storing the vCPUs which were blocked > on this pCPU. > > 2. pi_blocked_vcpu_lock: > The spinlock to protect pi_blocked_vcpu. > > - Add the following hooks, this part was suggested > by George Dunlap <george.dunlap@xxxxxxxxxxxxx> and > Dario Faggioli <dario.faggioli@xxxxxxxxxx>. > * arch_vcpu_block() > Called alled before vcpu is blocking and update the PID > (posted-interrupt descriptor). > > * vmx_pi_switch_from() > Called before context switch, we update the PID when the > vCPU is preempted or going to sleep. > > * vmx_pi_switch_to() > Called after context switch, we update the PID when the vCPU > is going to run. > > - Before VM-entry, check the state of PI descriptor, make sure the > 'NV' field is set to '&posted_intr_vector' when the guest is running > in non-root mode. Suggested by Jan Beulich <jbeulich@xxxxxxxx>. > So, TBH, I never felt in love with this changelog. I'm saying this now because I'm starting to loose track of things, and the changelog is not helpful as it could be, in explaining to me what this patch (and, in this case, this particular version of the patch) does. In fact, above, you're listing the functions and the data structure you are adding, and some of the places from where they're called/used. That, however, can all be figured out from the code. I think this would want to become a bit more high level, and try to describe the _reason_why_ the various bits and pieces have been put in the shape they are. For instance, although you added a paragraph about the VMENTRY check, it is not clear to me, by just reading this description, that it is meant at avoiding having to add much more hooks, as well as, as another example, what you put in the other email about it needing to be called on ever VMENTRY (the "we can't use vmx_do_resume() thing). I'm not saying you should rewrite another design document in this changelog, and I know it's hard to pick the best structure and content, but, really, this isn't ideal. Also, explicitly crediting people who suggested the various items is not helpful too. In 5 years someone (either of us or new), won't care much who actually came up with the idea of having hooks. What he'll care, is for the changelog to provide guidance throughout what the code does, and why it's been done that way. There is no trace of a similar crediting scheme (which, in principle, it would apply to any, o at least to major, comments you get at each review) in our history, and it makes the changelog heavier to read, so I'd cut it all off. > --- > v9: > - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare() > - Add vmx_pi_state_change() and call it before VM Entry > And this is why I said I'm losing track of what was your actual aim for this round. Since when the possibility of moving stuff to VMX helpers came out Jan said, during v7 review (about the need to undo what arch_vcpu_block() does in case local_evnts_need_delivery() is true): 1) "Couldn't this be taken care of by, if necessary, adjusting PI state in vmx_do_resume()?" And George said: 2) "we could just clear SN on every vmexit, and set SN and NDST on every vmentry, couldn't we? Then we wouldn't need hooks on the context switch path at all (which are only there to catch running -> runnable and runnable -> running) -- we could just have the block/unblock hooks (to change NV and add / remove things from the list)." Then, during v8, review comments focused mostly on 1), because we felt the need to try to get rid of the block-cancelling hook. What you seem to have done in this patch, is something in the middle. In fact, AFAIU: - the block-cancelling hook is no longer there, - the wakeup hooks are gone as well, - the context switch hooks are still there. Now, I'm not saying that this is not satisfactory as a result, I'm saying it's hard to keep track of what's going on and to figure out what your aim was in this round (and hence check the aim against the review comments, and the implementation against the aim). If describing that does not fit in the changelog (e.g., because it would be something about how the code is evolving, as it probably is), then put it in the cover letter (this very patch is almost the only thing being changed significantly in the series anyway), or in a few paragraph, here, after the "---", above the v9 bulleted list of changes. > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index e448b31..09c9c08 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -83,7 +83,132 @@ static int vmx_msr_write_intercept(unsigned int > msr, uint64_t msr_content); > +void vmx_vcpu_block(struct vcpu *v) > +{ > + unsigned long flags; > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !has_arch_pdevs(v->domain) ) > + return; > + > + ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS); > + > + /* > + * The vCPU is blocking, we need to add it to one of the per > pCPU lists. > + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use > it for > + * the per-CPU list, we also save it to posted-interrupt > descriptor and > + * make it as the destination of the wake-up notification event. > + */ > + v->arch.hvm_vmx.pi_block_cpu = v->processor; > + > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > + v->arch.hvm_vmx.pi_block_cpu), flags); > + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list, > + &per_cpu(pi_blocked_vcpu, v > ->arch.hvm_vmx.pi_block_cpu)); > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > + v->arch.hvm_vmx.pi_block_cpu), flags); > + > + ASSERT(!pi_test_sn(pi_desc)); > + > + /* > + * We don't need to set the 'NDST' field, since it should point > to > + * the same pCPU as v->processor. > + */ > + > + write_atomic(&pi_desc->nv, pi_wakeup_vector); > Glad to see the 'do { } while ( cpmxchg(xxx) );' not being necessary any longer! :-) > +} > + > +static void vmx_pi_switch_from(struct vcpu *v) > +{ > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !has_arch_pdevs(v->domain) || !iommu_intpost || > + test_bit(_VPF_blocked, &v->pause_flags) ) > + return; > + The '!iommu_intpost' is probably the quickest condition to check. If it is, it should be the first thing checked. The same applies in a few other places. > +static void vmx_pi_state_change(struct vcpu *v) > +{ FWIW, I don't like the name. _state_change to what? Looking at the code, it seems to me that what it does it to always sets the PI descriptor to posted_intr_vector, and removes the vCPU from a blocking list. The reason why it does so is because we don't want the vCPU to run while being in a blocked list, with posted_wakeup_vector in its PI descriptor. I'd go for a name that better reflect either the operations being done or the aim. > + unsigned long flags; > + unsigned int pi_block_cpu; > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !has_arch_pdevs(v->domain) || !iommu_intpost ) > + return; > + > + ASSERT(!test_bit(_VPF_blocked, &v->pause_flags)); > + > + /* > + * Set 'NV' field back to posted_intr_vector, so the > + * Posted-Interrupts can be delivered to the vCPU when > + * it is running in non-root mode. > + */ > + if ( pi_desc->nv != posted_intr_vector ) > + write_atomic(&pi_desc->nv, posted_intr_vector); > + > + /* the vCPU is not on any blocking list. */ > + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > + if ( pi_block_cpu == NR_CPUS ) > + return; > + > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), > flags); > + > + /* > + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU > was > + * removed from the blocking list while we are acquiring the > lock. > + */ > + if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS ) > + { > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > pi_block_cpu), flags); > + return; > + } > + > + list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > pi_block_cpu), flags); > +} > + > @@ -106,10 +231,18 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); > > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list); > + > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > + > v->arch.schedule_tail = vmx_do_resume; > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > > + if ( iommu_intpost && has_hvm_container_vcpu(v) ) > + v->arch.vcpu_block = vmx_vcpu_block; > + Genuine question: is it really possible to be here, in vmx_vcpu_initialise(), with has_hvm_container_vcpu()==false ? > +/* Handle VT-d posted-interrupt when VCPU is blocked. */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > + struct arch_vmx_struct *vmx, *tmp; > + struct vcpu *v; > + spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, > smp_processor_id()); > + struct list_head *blocked_vcpus = > + &per_cpu(pi_blocked_vcpu, > smp_processor_id()); > + LIST_HEAD(list); > + > + ack_APIC_irq(); > + this_cpu(irq_count)++; > + > + spin_lock(lock); > + > + /* > + * XXX: The length of the list depends on how many vCPU is > current > + * blocked on this specific pCPU. This may hurt the interrupt > latency > + * if the list grows to too many entries. > + */ > + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, > pi_blocked_vcpu_list) > + { > + if ( pi_test_on(&vmx->pi_desc) ) > + { > + list_del(&vmx->pi_blocked_vcpu_list); > + vmx->pi_block_cpu = NR_CPUS; > + > + /* > + * We cannot call vcpu_unblock here, since it also needs > + * 'pi_blocked_vcpu_lock', we store the vCPUs with ON > + * set in another list and unblock them after we release > + * 'pi_blocked_vcpu_lock'. > + */ > Is this still true? If yes, is it evident from this patch? I'm asking because I don't see that here. I see that it was like that in v7, before of the wakeup hooks... Is that still the case? > + list_add_tail(&vmx->pi_vcpu_on_set_list, &list); > + } > + } > + > + spin_unlock(lock); > + > + list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list) > + { > + v = container_of(vmx, struct vcpu, arch.hvm_vmx); > + list_del(&vmx->pi_vcpu_on_set_list); > + vcpu_unblock(v); > + } > +} > + > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index 3eefed7..6e5c2f9 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -854,6 +856,7 @@ static long do_poll(struct sched_poll > *sched_poll) > #endif > > rc = 0; > + > if ( local_events_need_delivery() ) > goto out; > Just don't add this. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.