[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling




> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
> Sent: Thursday, September 17, 2015 5:38 PM
> To: Dario Faggioli; Wu, Feng
> Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
> > On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
> >
> >>> -----Original Message-----
> >>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
> >
> >>> So, I guess, first of all, can you confirm whether or not it's exploding
> >>> in debug builds?
> >>
> >> Does the following information in Config.mk mean it is a debug build?
> >>
> >> # A debug build of Xen and tools?
> >> debug ?= y
> >> debug_symbols ?= $(debug)
> >>
> > I think so. But as I said in my other email, I was wrong, and this is
> > probably not an issue.
> >
> >>> And in either case (just tossing out ideas) would it be
> >>> possible to deal with the "interrupt already raised when blocking" case:
> >>
> >> Thanks for the suggestions below!
> >>
> > :-)
> >
> >>>  - later in the context switching function ?
> >> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() 
> >> instead
> >> of calling vcpu_unblock() directly, then when it returns to 
> >> context_switch(),
> >> we can check the flag and don't really block the vCPU.
> >>
> > Yeah, and that would still be rather hard to understand and maintain,
> > IMO.
> >
> >> But I don't have a clear
> >> picture about how to archive this, here are some questions from me:
> >> - When we are in context_switch(), we have done the following changes to
> >> vcpu's state:
> >>    * sd->curr is set to next
> >>    * vCPU's running state (both prev and next ) is changed by
> >>      vcpu_runstate_change()
> >>    * next->is_running is set to 1
> >>    * periodic timer for prev is stopped
> >>    * periodic timer for next is setup
> >>    ......
> >>
> >> So what point should we perform the action to _unblock_ the vCPU? We
> >> Need to roll back the formal changes to the vCPU's state, right?
> >>
> > Mmm... not really. Not blocking prev does not mean that prev would be
> > kept running on the pCPU, and that's true for your current solution as
> > well! As you say yourself, you're already in the process of switching
> > between prev and next, at a point where it's already a thing that next
> > will be the vCPU that will run. Not blocking means that prev is
> > reinserted to the runqueue, and a new invocation to the scheduler is
> > (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> > __runq_tickle()), but it's only when such new scheduling happens that
> > prev will (potentially) be selected to run again.
> >
> > So, no, unless I'm fully missing your point, there wouldn't be no
> > rollback required. However, I still would like the other solution (doing
> > stuff in vcpu_block()) better (see below).
> >
> >>>  - with another hook, perhaps in vcpu_block() ?
> >>
> >> We could check this in vcpu_block(), however, the logic here is that before
> >> vCPU is blocked, we need to change the posted-interrupt descriptor,
> >> and during changing it, if 'ON' bit is set, which means VT-d hardware
> >> issues a notification event because interrupts from the assigned devices
> >> is coming, we don't need to block the vCPU and hence no need to update
> >> the PI descriptor in this case.
> >>
> > Yep, I saw that. But could it be possible to do *everything* related to
> > blocking, including the update of the descriptor, in vcpu_block(), if no
> > interrupt have been raised yet at that time? I mean, would you, if
> > updating the descriptor in there, still get the event that allows you to
> > call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
> > the blocking, no matter whether that resulted in an actual context
> > switch already or not?
> >
> > I appreciate that this narrows the window for such an event to happen by
> > quite a bit, making the logic itself a little less useful (it makes
> > things more similar to "regular" blocking vs. event delivery, though,
> > AFAICT), but if it's correct, ad if it allows us to save the ugly
> > invocation of vcpu_unblock from context switch context, I'd give it a
> > try.
> >
> > After all, this PI thing requires actions to be taken when a vCPU is
> > scheduled or descheduled because of blocking, unblocking and
> > preemptions, and it would seem natural to me to:
> >  - deal with blockings in vcpu_block()
> >  - deal with unblockings in vcpu_wake()
> >  - deal with preemptions in context_switch()
> >
> > This does not mean being able to consolidate some of the cases (like
> > blockings and preemptions, in the current version of the code) were not
> > a nice thing... But we don't want it at all costs . :-)
> 
> So just to clarify the situation...
> 
> If a vcpu configured for the "running" state (i.e., NV set to
> "posted_intr_vector", notifications enabled), and an interrupt happens
> in the hypervisor -- what happens?
> 
> Is it the case that the interrupt is not actually delivered to the
> processor, but that the pending bit will be set in the pi field, so that
> the interrupt will be delivered the next time the hypervisor returns
> into the guest?
> 
> (I am assuming that is the case, because if the hypervisor *does* get an
> interrupt, then it can just unblock it there.)
> 
> This sort of race condition -- where we get an interrupt to wake up a
> vcpu as we're blocking -- is already handled for "old-style" interrupts
> in vcpu_block:
> 
> void vcpu_block(void)
> {
>     struct vcpu *v = current;
> 
>     set_bit(_VPF_blocked, &v->pause_flags);
> 
>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>     if ( local_events_need_delivery() )
>     {
>         clear_bit(_VPF_blocked, &v->pause_flags);
>     }
>     else
>     {
>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>         raise_softirq(SCHEDULE_SOFTIRQ);
>     }
> }
> 
> That is, we set _VPF_blocked, so that any interrupt which would wake it
> up actually wakes it up, and then we check local_events_need_delivery()
> to see if there were any that came in after we decided to block but
> before we made sure that an interrupt would wake us up.
> 
> I think it would be best if we could keep all the logic that does the
> same thing in the same place.  Which would mean in vcpu_block(), after
> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
> then extending local_events_need_delivery() to also look for pending PI
> events.
> 
> Looking a bit more at your states, I think the actions that need to be
> taken on all the transitions are these (collapsing 'runnable' and
> 'offline' into the same state):
> 
> blocked -> runnable (vcpu_wake)
>  - NV = posted_intr_vector
>  - Take vcpu off blocked list
>  - SN = 1
> runnable -> running (context_switch)
>  - SN = 0

Need set the 'NDST' field to the right dest vCPU as well.

> running -> runnable (context_switch)
>  - SN = 1
> running -> blocked (vcpu_block)
>  - NV = pi_wakeup_vector
>  - Add vcpu to blocked list

Need set the 'NDST' field to the pCPU which owns the blocking list,
So we can wake up the vCPU from the right blocking list in the wakeup
event handler.

> 
> This actually has a few pretty nice properties:
> 1. You have a nice pair of complementary actions -- block / wake, run /
> preempt
> 2. The potentially long actions with lists happen in vcpu_wake and
> vcpu_block, not on the context switch path
> 
> And at this point, you don't have the "lazy context switch" issue
> anymore, do we?  Because we've handled the "blocking" case in
> vcpu_block(), we don't need to do anything in the main context_switch()
> (which may do the lazy context switching into idle).  We only need to do
> something in the actual __context_switch().

I think the handling for lazy context switch is not only for the blocking case,
we still need to do something for lazy context switch even we handled the
blocking case in vcpu_block(), such as,
1. For non-idle -> idle
- set 'SN'

2. For idle -> non-idle
- clear 'SN'
- set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe
this one doesn't belong to lazy context switch, if the cpu of the non-idle
vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(),
hence it will go to __context_switch() directly, right?)

Thanks,
Feng

> 
> And at that point, could we actually get rid of the PI-specific context
> switch hooks altogether, and just put the SN state changes required for
> running->runnable and runnable->running in vmx_ctxt_switch_from() and
> vmx_ctxt_switch_to()?
> 
> If so, then the only hooks we need to add are vcpu_block and vcpu_wake.
>  To keep these consistent with other scheduling-related functions, I
> would put these in arch_vcpu, next to ctxt_switch_from() and
> ctxt_switch_to().
> 
> Thoughts?
> 
>  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.