Xen project Mailing List

Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

Date: Wed, 20 Jan 2016 07:49:35 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>

Delivery-date: Wed, 20 Jan 2016 07:52:30 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHRUgLrDSmjehpflEaKY+UPZo7xw58Dr8ZA

Thread-topic: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: Monday, January 18, 2016 11:14 PM > To: Wu, Feng <feng.wu@xxxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Dario Faggioli > <dario.faggioli@xxxxxxxxxx>; George Dunlap <george.dunlap@xxxxxxxxxxxxx>; > Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx; Keir Fraser > <keir@xxxxxxx> > Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling > > >>> On 03.12.15 at 09:35, <feng.wu@xxxxxxxxx> wrote: > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int > msr, uint64_t msr_content); > > + ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS); > > + > > + /* > > + * The vCPU is blocking, we need to add it to one of the per pCPU > > lists. > > + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it > for > > + * the per-CPU list, we also save it to posted-interrupt descriptor and > > + * make it as the destination of the wake-up notification event. > > + */ > > + v->arch.hvm_vmx.pi_block_cpu = v->processor; > > + > > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > > + v->arch.hvm_vmx.pi_block_cpu), flags); > > + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list, > > + &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu)); > > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > > + v->arch.hvm_vmx.pi_block_cpu), flags); > > + > > + ASSERT(!pi_test_sn(pi_desc)); > > + > > + /* > > + * We don't need to set the 'NDST' field, since it should point to > > + * the same pCPU as v->processor. > > + */ > > + > > + write_atomic(&pi_desc->nv, pi_wakeup_vector); > > Stray blank line between comment and corresponding code. > > Also the ASSERT() is rather more disconnected from the write > than seems desirable: Wouldn't it be better to cmpxchg() the > whole 32-bit value, validating that SN is clear in the result? Not quite understand this. The control field is 64 bits, do you mean cmpxchg() the whole 64-bit value like follows: + do { + old.control = new.control = pi_desc->control; + new.nv = pi_wakeup_vector; + } while ( cmpxchg(&pi_desc->control, old.control, new.control) + != old.control ); This a somewhat like the implementation in the earlier versions, why do you want to change it back? > > + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > > + if ( pi_block_cpu == NR_CPUS ) > > + return; > > Are this condition and the one in the immediately preceding if() > connected in any way? I am not sure I understand this correctly. Which one is " the one in the immediately preceding if()"? > I.e. could the latter perhaps become an > ASSERT() by connecting the two blocks suitably? I'm simply > trying to limit the number of different cases to consider mentally > when looking at this code ... If we get a true from above check, it means the vcpu was removed by pi_wakeup_interrupt(), then we just need to return. If we get a false, we will acquire the spinlock as the following code does, there are two scenarios: - pi_wakeup_interrupt() acquires the spinlock before us, then when we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing. - We get the spinlock before pi_wakeup_interrupt(), this time we need to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu' to NR_CPUS. > > > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), > flags); > > + > > + /* > > + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU > was > > + * removed from the blocking list while we are acquiring the lock. > > + */ > > + if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS ) > > With you wanting to deal with changes behind your back here, > isn't there come kind of barrier needed between reading and using > pi_block_cpu, such that the compiler won't convert the local > variable accesses into multiple reads of > v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)? Will think about this. > > > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > > > spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); > > > > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > > + > > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > > + > > v->arch.schedule_tail = vmx_do_resume; > > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > > > > + if ( iommu_intpost ) > > + v->arch.vcpu_block = vmx_vcpu_block; > > Considering the locking question above - couldn't this be done only > when a device gets added, and the pointer zapped upon removal > of the last device, at once saving the conditional inside the hook? That is another solution, but I think it make simple thing difficult, I feel the current logic is simple and clear. Btw, this hook is not a critical path, another conditional in it should not a big deal. > > --- a/xen/common/schedule.c > > +++ b/xen/common/schedule.c > > @@ -802,6 +802,8 @@ void vcpu_block(void) > > > > set_bit(_VPF_blocked, &v->pause_flags); > > > > + arch_vcpu_block(v); > > + > > /* Check for events /after/ blocking: avoids wakeup waiting race. */ > > if ( local_events_need_delivery() ) > > { > > ISTR some previous discussion on this placement - wasn't it > determined that putting it in the else part here would be > sufficient for your needs and better in terms of overhead? > Or is there some race possible if moved past the invocation > of local_events_need_delivery()? The discussion happened several month ago, it is really hard to recall every details, I have to look for it from my index. It is really bad to last for so long for the review. We need to call arch_vcpu_block() before local_events_need_delivery(), since VT-d hardware can issue notification event when we are in arch_vcpu_block(), it that happens, 'ON' bit is set, then we will check it in local_events_need_delivery(). If we do arch_vcpu_block() in the else part, we cannot handle this. This is one reason I can recall, I am not sure whether there are other concerns since it has been really a long time. The current implementation is fully discussed with scheduler maintainers. BTW, I don't think the current implantation introduces much overhead. > > > @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll) > > v->poll_evtchn = -1; > > set_bit(v->vcpu_id, d->poll_mask); > > > > + arch_vcpu_block(v); > > + > > #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */ > > /* Check for events /after/ setting flags: avoids wakeup waiting race. > > */ > > smp_mb(); > > Same question here then regarding moving this further down. > > > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > > > + unsigned int pi_block_cpu; > > Looking at all the use sites I wonder whether it wouldn't make for > easier to read code if you instead stored the lock to acquire. This > would then perhaps also prepare a road for balancing lists growing > too large (i.e. rather than having one list per CPU, you could > then have a variable number of lists, depending on current needs, > and [kind of] freely select one of them). Since this patch is still experimental, can we put the "length of the list" issue in the next stage when we turn it to default on? Thanks, Feng _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.