Xen project Mailing List

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 10 Sep 2015 03:26:29 -0600

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Thu, 10 Sep 2015 09:26:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 10.09.15 at 10:59, <feng.wu@xxxxxxxxx> wrote: >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> Sent: Thursday, September 10, 2015 4:28 PM >> >>> On 10.09.15 at 04:07, <feng.wu@xxxxxxxxx> wrote: >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> >> Sent: Wednesday, September 09, 2015 6:27 PM >> >> >>> On 09.09.15 at 10:56, <feng.wu@xxxxxxxxx> wrote: >> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> >> >> Sent: Monday, September 07, 2015 8:55 PM >> >> >> >>> On 25.08.15 at 03:57, <feng.wu@xxxxxxxxx> wrote: >> >> >> > + >> >> >> > + /* >> >> >> > + * Delete the vCPU from the related block list >> >> >> > + * if we are resuming from blocked state. >> >> >> > + */ >> >> >> > + if ( v->arch.hvm_vmx.pi_block_cpu != -1 ) >> >> >> > + { >> >> >> > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, >> >> >> > + v->arch.hvm_vmx.pi_block_cpu), >> >> >> flags); >> >> >> > + list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list); >> >> >> >> >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing >> >> >> the vCPU from the list? Which then ought to allow using just >> >> >> list_del() here? >> >> > >> >> > Here is the story about using list_del_init() instead of list_del(), >> >> > (the >> >> > same reason for using list_del_init() in pi_wakeup_interrupt() ). >> >> > If we use list_del(), that means we need to set .pi_block_cpu to >> >> > -1 after removing the vCPU from the block list, there are two >> >> > places where the vCPU is removed from the list: >> >> > - arch_vcpu_wake_prepare() >> >> > - pi_wakeup_interrupt() >> >> > >> >> > That means we also need to set .pi_block_cpu to -1 in >> >> > pi_wakeup_interrupt(), which seems problematic to me, since >> >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe >> >> > to change it in pi_wakeup_interrupt(), maybe someone is using >> >> > it at the same time. >> >> > >> >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here >> >> > is only used when .pi_block_cpu is set to -1 at the initial time. >> >> > >> >> > If you have any good suggestions about this, I will be all ears. >> >> >> >> Latching pi_block_cpu into a local variable would take care of that >> >> part of the problem. Of course you then may also need to check >> >> that while waiting to acquire the lock pi_block_cpu didn't change. >> > >> > Thanks for the suggestion! But I think it is not easy to "check >> > "that while waiting to acquire the lock pi_block_cpu didn't change". >> > Let's take the following pseudo code as an example: >> > >> > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; >> > >> > ...... >> > >> > spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, >> > local_pi_block_cpu), flags); >> > list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); >> > spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, >> > local_pi_block_cpu), flags); >> > >> > If .pi_block_cpu is changed to -1 by others during calling list_del() >> > above, that means the vCPU is removed by others, then calling >> > list_del() here again would be problematic. I think there might be >> > other corner cases for this. So I suggest adding some comments >> > for list_del_init() as you mentioned below. >> >> That's why I said "check that while waiting to acquire the lock >> pi_block_cpu didn't change." The code you present does not do >> this. > > I didn't see we need implement it as the above code, I just list it > here the show it is hard to do that. > First, how to check it while waiting to acquire the lock .pi_block_cpu > didn't change? Note the difference between "check while waiting" and "check that while waiting": The former is indeed hard to implement, while the latter is pretty straightforward (and we do so elsewhere). > Secondly, even if we can check it, what should we do if .pi_block_cpu > is changed after acquiring the lock as I mentioned above? Drop the lock and start over. I.e. (taking your pseudo code) restart: local_pi_block_cpu = ...; bail-if-invalid (e.g. -1 in current model) spin_lock_irqsave(&per_cpu(, local_pi_block_cpu), flags); if(local_pi_block_cpu != actual_pi_block_cpu) { spin_unlock_irqrestore(&per_cpu(,local_pi_block_cpu), flags); goto restart; } list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); spin_unlock_irqrestore(&per_cpu(,local_pi_block_cpu), flags); >> > Anyway, I go through the main path of the code as below, I really don't >> > find >> > anything unsafe here. >> > >> > context_switch() --> >> > pi_ctxt_switch_from() --> >> > vmx_pre_ctx_switch_pi() --> >> > vcpu_unblock() --> >> > vcpu_wake() --> >> > SCHED_OP(VCPU2OP(v), wake, v) --> >> > csched_vcpu_wake() --> >> > __runq_insert() >> > __runq_tickle() >> > >> > If you continue to think it is unsafe, pointing out the place will be >> > welcome! >> >> Once again - I didn't say it's unsafe (and hence can't point at a >> particular place). What bothers me is that vcpu_unblock() affects >> scheduler state, and altering scheduler state from inside the >> context switch path looks problematic at the first glance. I'd be >> happy if I was told there is no such problem, but be aware that >> this would have to include latent ones: Even if right now things >> are safe, having such feedback have the potential of becoming >> an actual problem with a later scheduler change is already an >> issue, as finding the problem is then likely going to be rather hard >> (and I suspect it's not going to be you to debug this). > > What I can say is that after investigation, I don't find anything bad > for this vcpu_unblock(), I tested it in my experiment. So that is why > I'd like to ask some ideas from scheduler expects. Agreed - I'm also awaiting their input. >> >> >> > +static void vmx_post_ctx_switch_pi(struct vcpu *v) >> >> >> > +{ >> >> >> > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; >> >> >> > + >> >> >> > + if ( !has_arch_pdevs(v->domain) ) >> >> >> > + return; >> >> >> > + >> >> >> > + if ( x2apic_enabled ) >> >> >> > + write_atomic(&pi_desc->ndst, >> cpu_physical_id(v->processor)); >> >> >> > + else >> >> >> > + write_atomic(&pi_desc->ndst, >> >> >> > + MASK_INSR(cpu_physical_id(v->processor), >> >> >> > + PI_xAPIC_NDST_MASK)); >> >> >> > + >> >> >> > + pi_clear_sn(pi_desc); >> >> >> > +} >> >> >> >> >> >> So you alter where notifications go, but not via which vector? How >> >> >> is the vCPU going to get removed from the blocked list then? >> >> > >> >> > vmx_post_ctx_switch_pi() is called when the vCPU is about to run, >> >> > in that case, the vector has been set properly and it has been removed >> >> > from the block list if it was blocked before. >> >> >> >> So why do you two updates then (first [elsewhere] to set the vector >> >> and then here to set the destination)? >> > >> > When the vCPU is unblocked and then removed from the blocking list, we >> > need to change the vector to the normal notification vector, since the >> > wakeup vector is only used when the vCPU is blocked and hence in the >> > blocked list. And here is the place we can decide which pCPU the vCPU >> > will be scheduled on, so we change the destination field here. >> >> Right, that's what I understood. But isn't the state things are in >> between these two updates inconsistent? I.e. - where does the >> notification go if one arrives in this window? > > Before arriving here, the SN is set, no need to send notification event and > hence no notification at all. Ah, okay. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.