Xen project Mailing List

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 09 Sep 2015 04:26:40 -0600

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 09 Sep 2015 10:26:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 09.09.15 at 10:56, <feng.wu@xxxxxxxxx> wrote: >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> Sent: Monday, September 07, 2015 8:55 PM >> >>> On 25.08.15 at 03:57, <feng.wu@xxxxxxxxx> wrote: >> > @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct >> vcpu *next) >> > >> > set_current(next); >> > >> > + pi_ctxt_switch_from(prev); >> > + >> > if ( (per_cpu(curr_vcpu, cpu) == next) || >> > (is_idle_domain(nextd) && cpu_online(cpu)) ) >> > { >> > + pi_ctxt_switch_to(next); >> > local_irq_enable(); >> >> This placement, if really intended that way, needs explanation (in a >> comment) and perhaps even renaming of the involved symbols, as > > I think maybe removing the static wrapper function can make thing > clearer. What is your opinion? Considering that there's going to be a second use of the switch-to variant, I think the helpers are okay to be kept (but I wouldn't insist on that), just that they need to be named in a away making clear what their purpose is. >> > @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v) >> > INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); >> > INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list); >> > >> > + v->arch.hvm_vmx.pi_block_cpu = -1; >> > + >> > + spin_lock_init(&v->arch.hvm_vmx.pi_lock); >> > + >> > v->arch.schedule_tail = vmx_do_resume; >> > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; >> > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; >> > >> > + if ( iommu_intpost && is_hvm_vcpu(v) ) >> > + { >> > + v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi; >> > + v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi; >> > + } >> >> Why conditional upon is_hvm_vcpu()? > > I only think we need to add these hooks for PV guests, right? "... we don't need to ..."? > Or you mean I should use has_hvm_container_vcpu() instead? Exactly. >> > + >> > + /* >> > + * Delete the vCPU from the related block list >> > + * if we are resuming from blocked state. >> > + */ >> > + if ( v->arch.hvm_vmx.pi_block_cpu != -1 ) >> > + { >> > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, >> > + v->arch.hvm_vmx.pi_block_cpu), >> flags); >> > + list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list); >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing >> the vCPU from the list? Which then ought to allow using just >> list_del() here? > > Here is the story about using list_del_init() instead of list_del(), (the > same reason for using list_del_init() in pi_wakeup_interrupt() ). > If we use list_del(), that means we need to set .pi_block_cpu to > -1 after removing the vCPU from the block list, there are two > places where the vCPU is removed from the list: > - arch_vcpu_wake_prepare() > - pi_wakeup_interrupt() > > That means we also need to set .pi_block_cpu to -1 in > pi_wakeup_interrupt(), which seems problematic to me, since > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe > to change it in pi_wakeup_interrupt(), maybe someone is using > it at the same time. > > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here > is only used when .pi_block_cpu is set to -1 at the initial time. > > If you have any good suggestions about this, I will be all ears. Latching pi_block_cpu into a local variable would take care of that part of the problem. Of course you then may also need to check that while waiting to acquire the lock pi_block_cpu didn't change. And if that absolutely can't be made work correctly, these apparently strange uses of list_del_init() would each need a justifying comment. >> > + do { >> > + old.control = new.control = pi_desc->control; >> > + >> > + /* Should not block the vCPU if an interrupt was posted for >> > it. >> */ >> > + if ( pi_test_on(&old) ) >> > + { >> > + spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, >> gflags); >> > + vcpu_unblock(v); >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And >> is this safe? > > I cannot see anything unsafe so far, can some scheduler maintainer > help to confirm it? Dario? George? But first and foremost you ought to answer the "why". >> And if really needed to cover something not dealt with >> elsewhere, wouldn't this need to happen _after_ having switched >> the notification mechanism below? > > If ON is set, we don't need to block the vCPU hence no need to change the > notification vector to the wakeup one, which is used when vCPU is blocked. > >> >> > + return; Oh, right, I somehow managed to ignore the "return" here. >> > +static void vmx_post_ctx_switch_pi(struct vcpu *v) >> > +{ >> > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; >> > + >> > + if ( !has_arch_pdevs(v->domain) ) >> > + return; >> > + >> > + if ( x2apic_enabled ) >> > + write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor)); >> > + else >> > + write_atomic(&pi_desc->ndst, >> > + MASK_INSR(cpu_physical_id(v->processor), >> > + PI_xAPIC_NDST_MASK)); >> > + >> > + pi_clear_sn(pi_desc); >> > +} >> >> So you alter where notifications go, but not via which vector? How >> is the vCPU going to get removed from the blocked list then? > > vmx_post_ctx_switch_pi() is called when the vCPU is about to run, > in that case, the vector has been set properly and it has been removed > from the block list if it was blocked before. So why do you two updates then (first [elsewhere] to set the vector and then here to set the destination)? >> > @@ -756,6 +902,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v) >> > >> > vmx_restore_guest_msrs(v); >> > vmx_restore_dr(v); >> > + vmx_post_ctx_switch_pi(v); >> > } >> >> Ah, here is the other invocation! But no, this shouldn't be done this >> way. This really belongs into vendor independent code, even if the >> indirect call overhead is slightly higher. > > Why do you say this is vendor independent? What is your suggestion? This needs to become a second invocation of pi_ctxt_switch_to() from the context switch code. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.