[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
> -----Original Message----- > From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] > Sent: Monday, September 21, 2015 5:19 PM > To: Wu, Feng; Dario Faggioli > Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap; Andrew > Cooper; Jan Beulich > Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core > logic > handling > > On 09/21/2015 06:08 AM, Wu, Feng wrote: > > > > > >> -----Original Message----- > >> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] > >> Sent: Thursday, September 17, 2015 5:38 PM > >> To: Dario Faggioli; Wu, Feng > >> Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap; > Andrew > >> Cooper; Jan Beulich > >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core > logic > >> handling > >> > >> On 09/17/2015 09:48 AM, Dario Faggioli wrote: > >>> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote: > >>> > >>>>> -----Original Message----- > >>>>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx] > >>> > >>>>> So, I guess, first of all, can you confirm whether or not it's exploding > >>>>> in debug builds? > >>>> > >>>> Does the following information in Config.mk mean it is a debug build? > >>>> > >>>> # A debug build of Xen and tools? > >>>> debug ?= y > >>>> debug_symbols ?= $(debug) > >>>> > >>> I think so. But as I said in my other email, I was wrong, and this is > >>> probably not an issue. > >>> > >>>>> And in either case (just tossing out ideas) would it be > >>>>> possible to deal with the "interrupt already raised when blocking" case: > >>>> > >>>> Thanks for the suggestions below! > >>>> > >>> :-) > >>> > >>>>> - later in the context switching function ? > >>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() > instead > >>>> of calling vcpu_unblock() directly, then when it returns to > context_switch(), > >>>> we can check the flag and don't really block the vCPU. > >>>> > >>> Yeah, and that would still be rather hard to understand and maintain, > >>> IMO. > >>> > >>>> But I don't have a clear > >>>> picture about how to archive this, here are some questions from me: > >>>> - When we are in context_switch(), we have done the following changes to > >>>> vcpu's state: > >>>> * sd->curr is set to next > >>>> * vCPU's running state (both prev and next ) is changed by > >>>> vcpu_runstate_change() > >>>> * next->is_running is set to 1 > >>>> * periodic timer for prev is stopped > >>>> * periodic timer for next is setup > >>>> ...... > >>>> > >>>> So what point should we perform the action to _unblock_ the vCPU? We > >>>> Need to roll back the formal changes to the vCPU's state, right? > >>>> > >>> Mmm... not really. Not blocking prev does not mean that prev would be > >>> kept running on the pCPU, and that's true for your current solution as > >>> well! As you say yourself, you're already in the process of switching > >>> between prev and next, at a point where it's already a thing that next > >>> will be the vCPU that will run. Not blocking means that prev is > >>> reinserted to the runqueue, and a new invocation to the scheduler is > >>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in > >>> __runq_tickle()), but it's only when such new scheduling happens that > >>> prev will (potentially) be selected to run again. > >>> > >>> So, no, unless I'm fully missing your point, there wouldn't be no > >>> rollback required. However, I still would like the other solution (doing > >>> stuff in vcpu_block()) better (see below). > >>> > >>>>> - with another hook, perhaps in vcpu_block() ? > >>>> > >>>> We could check this in vcpu_block(), however, the logic here is that > >>>> before > >>>> vCPU is blocked, we need to change the posted-interrupt descriptor, > >>>> and during changing it, if 'ON' bit is set, which means VT-d hardware > >>>> issues a notification event because interrupts from the assigned devices > >>>> is coming, we don't need to block the vCPU and hence no need to update > >>>> the PI descriptor in this case. > >>>> > >>> Yep, I saw that. But could it be possible to do *everything* related to > >>> blocking, including the update of the descriptor, in vcpu_block(), if no > >>> interrupt have been raised yet at that time? I mean, would you, if > >>> updating the descriptor in there, still get the event that allows you to > >>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo > >>> the blocking, no matter whether that resulted in an actual context > >>> switch already or not? > >>> > >>> I appreciate that this narrows the window for such an event to happen by > >>> quite a bit, making the logic itself a little less useful (it makes > >>> things more similar to "regular" blocking vs. event delivery, though, > >>> AFAICT), but if it's correct, ad if it allows us to save the ugly > >>> invocation of vcpu_unblock from context switch context, I'd give it a > >>> try. > >>> > >>> After all, this PI thing requires actions to be taken when a vCPU is > >>> scheduled or descheduled because of blocking, unblocking and > >>> preemptions, and it would seem natural to me to: > >>> - deal with blockings in vcpu_block() > >>> - deal with unblockings in vcpu_wake() > >>> - deal with preemptions in context_switch() > >>> > >>> This does not mean being able to consolidate some of the cases (like > >>> blockings and preemptions, in the current version of the code) were not > >>> a nice thing... But we don't want it at all costs . :-) > >> > >> So just to clarify the situation... > >> > >> If a vcpu configured for the "running" state (i.e., NV set to > >> "posted_intr_vector", notifications enabled), and an interrupt happens > >> in the hypervisor -- what happens? > >> > >> Is it the case that the interrupt is not actually delivered to the > >> processor, but that the pending bit will be set in the pi field, so that > >> the interrupt will be delivered the next time the hypervisor returns > >> into the guest? > >> > >> (I am assuming that is the case, because if the hypervisor *does* get an > >> interrupt, then it can just unblock it there.) > >> > >> This sort of race condition -- where we get an interrupt to wake up a > >> vcpu as we're blocking -- is already handled for "old-style" interrupts > >> in vcpu_block: > >> > >> void vcpu_block(void) > >> { > >> struct vcpu *v = current; > >> > >> set_bit(_VPF_blocked, &v->pause_flags); > >> > >> /* Check for events /after/ blocking: avoids wakeup waiting race. */ > >> if ( local_events_need_delivery() ) > >> { > >> clear_bit(_VPF_blocked, &v->pause_flags); > >> } > >> else > >> { > >> TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, > v->vcpu_id); > >> raise_softirq(SCHEDULE_SOFTIRQ); > >> } > >> } > >> > >> That is, we set _VPF_blocked, so that any interrupt which would wake it > >> up actually wakes it up, and then we check local_events_need_delivery() > >> to see if there were any that came in after we decided to block but > >> before we made sure that an interrupt would wake us up. > >> > >> I think it would be best if we could keep all the logic that does the > >> same thing in the same place. Which would mean in vcpu_block(), after > >> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and > >> then extending local_events_need_delivery() to also look for pending PI > >> events. > >> > >> Looking a bit more at your states, I think the actions that need to be > >> taken on all the transitions are these (collapsing 'runnable' and > >> 'offline' into the same state): > >> > >> blocked -> runnable (vcpu_wake) > >> - NV = posted_intr_vector > >> - Take vcpu off blocked list > >> - SN = 1 > >> runnable -> running (context_switch) > >> - SN = 0 > > > > Need set the 'NDST' field to the right dest vCPU as well. > > > >> running -> runnable (context_switch) > >> - SN = 1 > >> running -> blocked (vcpu_block) > >> - NV = pi_wakeup_vector > >> - Add vcpu to blocked list > > > > Need set the 'NDST' field to the pCPU which owns the blocking list, > > So we can wake up the vCPU from the right blocking list in the wakeup > > event handler. > > > >> > >> This actually has a few pretty nice properties: > >> 1. You have a nice pair of complementary actions -- block / wake, run / > >> preempt > >> 2. The potentially long actions with lists happen in vcpu_wake and > >> vcpu_block, not on the context switch path > >> > >> And at this point, you don't have the "lazy context switch" issue > >> anymore, do we? Because we've handled the "blocking" case in > >> vcpu_block(), we don't need to do anything in the main context_switch() > >> (which may do the lazy context switching into idle). We only need to do > >> something in the actual __context_switch(). > > > > I think the handling for lazy context switch is not only for the blocking > > case, > > we still need to do something for lazy context switch even we handled the > > blocking case in vcpu_block(), such as, > > 1. For non-idle -> idle > > - set 'SN' > > If we set SN in vcpu_block(), then we don't need to set it on context > switch -- æäæï For preemption case (not blocking case) , we still need to clear/set SN, and this has no business with vcpu_block()/vcpu_wake(), right? Do I miss something here? BTW, you Chinese is good! :) > > > 2. For idle -> non-idle > > - clear 'SN' > > - set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe > > this one doesn't belong to lazy context switch, if the cpu of the non-idle > > vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(), > > hence it will go to __context_switch() directly, right?) > > If we clear SN* in vcpu_wake(), then we don't need to clear it on a > context switch. The same as above. > And the only way we can transition from "idle lazy > context switch" to "runnable" is if the vcpu was the last vcpu running > on this pcpu -- in which case, NDST should already be set to this pcpu, > right? Yes, like I mentioned above, in this case, the NDST filed is not changed, so we don't need to update it for lazy idle vcpu to "runnable" transition. Thanks, Feng > > -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |