[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: Thursday, September 10, 2015 4:28 PM > To: Wu, Feng > Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin; > xen-devel@xxxxxxxxxxxxx; Keir Fraser > Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted > interrupts > > >>> On 10.09.15 at 04:07, <feng.wu@xxxxxxxxx> wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: Wednesday, September 09, 2015 6:27 PM > >> >>> On 09.09.15 at 10:56, <feng.wu@xxxxxxxxx> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> >> Sent: Monday, September 07, 2015 8:55 PM > >> >> >>> On 25.08.15 at 03:57, <feng.wu@xxxxxxxxx> wrote: > >> >> > + > >> >> > + /* > >> >> > + * Delete the vCPU from the related block list > >> >> > + * if we are resuming from blocked state. > >> >> > + */ > >> >> > + if ( v->arch.hvm_vmx.pi_block_cpu != -1 ) > >> >> > + { > >> >> > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > >> >> > + v->arch.hvm_vmx.pi_block_cpu), > >> >> flags); > >> >> > + list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > >> >> > >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing > >> >> the vCPU from the list? Which then ought to allow using just > >> >> list_del() here? > >> > > >> > Here is the story about using list_del_init() instead of list_del(), (the > >> > same reason for using list_del_init() in pi_wakeup_interrupt() ). > >> > If we use list_del(), that means we need to set .pi_block_cpu to > >> > -1 after removing the vCPU from the block list, there are two > >> > places where the vCPU is removed from the list: > >> > - arch_vcpu_wake_prepare() > >> > - pi_wakeup_interrupt() > >> > > >> > That means we also need to set .pi_block_cpu to -1 in > >> > pi_wakeup_interrupt(), which seems problematic to me, since > >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe > >> > to change it in pi_wakeup_interrupt(), maybe someone is using > >> > it at the same time. > >> > > >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here > >> > is only used when .pi_block_cpu is set to -1 at the initial time. > >> > > >> > If you have any good suggestions about this, I will be all ears. > >> > >> Latching pi_block_cpu into a local variable would take care of that > >> part of the problem. Of course you then may also need to check > >> that while waiting to acquire the lock pi_block_cpu didn't change. > > > > Thanks for the suggestion! But I think it is not easy to "check > > "that while waiting to acquire the lock pi_block_cpu didn't change". > > Let's take the following pseudo code as an example: > > > > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > > > > ...... > > > > spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > > local_pi_block_cpu), flags); > > list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > > spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > > local_pi_block_cpu), flags); > > > > If .pi_block_cpu is changed to -1 by others during calling list_del() > > above, that means the vCPU is removed by others, then calling > > list_del() here again would be problematic. I think there might be > > other corner cases for this. So I suggest adding some comments > > for list_del_init() as you mentioned below. > > That's why I said "check that while waiting to acquire the lock > pi_block_cpu didn't change." The code you present does not do > this. I didn't see we need implement it as the above code, I just list it here the show it is hard to do that. First, how to check it while waiting to acquire the lock .pi_block_cpu didn't change? Secondly, even if we can check it, what should we do if .pi_block_cpu is changed after acquiring the lock as I mentioned above? > > >> >> > + do { > >> >> > + old.control = new.control = pi_desc->control; > >> >> > + > >> >> > + /* Should not block the vCPU if an interrupt was posted > for > >> it. > >> >> */ > >> >> > + if ( pi_test_on(&old) ) > >> >> > + { > >> >> > + > spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, > >> >> gflags); > >> >> > + vcpu_unblock(v); > >> >> > >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And > >> >> is this safe? > >> > > >> > I cannot see anything unsafe so far, can some scheduler maintainer > >> > help to confirm it? Dario? George? > >> > >> But first and foremost you ought to answer the "why". > > > > I think if you think this solution is unsafe, you need point out where it > > is, not > > just ask "is this safe ?", I don't think this is an effective comments. > > It is my general understanding that people wanting code to be > included have to prove their code is okay, not reviewers to prove > the code is broken. > > > Anyway, I go through the main path of the code as below, I really don't find > > anything unsafe here. > > > > context_switch() --> > > pi_ctxt_switch_from() --> > > vmx_pre_ctx_switch_pi() --> > > vcpu_unblock() --> > > vcpu_wake() --> > > SCHED_OP(VCPU2OP(v), wake, v) --> > > csched_vcpu_wake() --> > > __runq_insert() > > __runq_tickle() > > > > If you continue to think it is unsafe, pointing out the place will be > > welcome! > > Once again - I didn't say it's unsafe (and hence can't point at a > particular place). What bothers me is that vcpu_unblock() affects > scheduler state, and altering scheduler state from inside the > context switch path looks problematic at the first glance. I'd be > happy if I was told there is no such problem, but be aware that > this would have to include latent ones: Even if right now things > are safe, having such feedback have the potential of becoming > an actual problem with a later scheduler change is already an > issue, as finding the problem is then likely going to be rather hard > (and I suspect it's not going to be you to debug this). What I can say is that after investigation, I don't find anything bad for this vcpu_unblock(), I tested it in my experiment. So that is why I'd like to ask some ideas from scheduler expects. > > >> >> > +static void vmx_post_ctx_switch_pi(struct vcpu *v) > >> >> > +{ > >> >> > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > >> >> > + > >> >> > + if ( !has_arch_pdevs(v->domain) ) > >> >> > + return; > >> >> > + > >> >> > + if ( x2apic_enabled ) > >> >> > + write_atomic(&pi_desc->ndst, > cpu_physical_id(v->processor)); > >> >> > + else > >> >> > + write_atomic(&pi_desc->ndst, > >> >> > + MASK_INSR(cpu_physical_id(v->processor), > >> >> > + PI_xAPIC_NDST_MASK)); > >> >> > + > >> >> > + pi_clear_sn(pi_desc); > >> >> > +} > >> >> > >> >> So you alter where notifications go, but not via which vector? How > >> >> is the vCPU going to get removed from the blocked list then? > >> > > >> > vmx_post_ctx_switch_pi() is called when the vCPU is about to run, > >> > in that case, the vector has been set properly and it has been removed > >> > from the block list if it was blocked before. > >> > >> So why do you two updates then (first [elsewhere] to set the vector > >> and then here to set the destination)? > > > > When the vCPU is unblocked and then removed from the blocking list, we > > need to change the vector to the normal notification vector, since the > > wakeup vector is only used when the vCPU is blocked and hence in the > > blocked list. And here is the place we can decide which pCPU the vCPU > > will be scheduled on, so we change the destination field here. > > Right, that's what I understood. But isn't the state things are in > between these two updates inconsistent? I.e. - where does the > notification go if one arrives in this window? Before arriving here, the SN is set, no need to send notification event and hence no notification at all. Particularly - if it went > to the right place, why would this second update be needed at all? So we need to read the ndst from the PI descriptor and compare the cpu_physical_id(v->processor), if they are different, update it? I don't think this check is needed, since we need at least a read operation, an if check, then maybe a write operation. Thanks, Feng > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |