Xen project Mailing List

Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Date: Tue, 27 Oct 2015 05:19:16 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Keir Fraser <keir@xxxxxxx>

Delivery-date: Tue, 27 Oct 2015 05:19:32 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHRD/wuSTi/T9/Q9kmwew5iGD9eBJ5+jvhg

Thread-topic: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling

> -----Original Message----- > From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx] > Sent: Monday, October 26, 2015 10:40 PM > To: Wu, Feng <feng.wu@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx > Cc: Tian, Kevin <kevin.tian@xxxxxxxxx>; Keir Fraser <keir@xxxxxxx>; George > Dunlap <george.dunlap@xxxxxxxxxxxxx>; Andrew Cooper > <andrew.cooper3@xxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx> > Subject: Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core > logic > handling > > Hey, > > Here I am reviewing this patch, sorry for the delay. Thanks a lot for the review! > > Ok, we have discussed a lot about all this, and in fact I had to go > back in my mail archive and re-read the rather long sub-thread for this > patch in v7. :-) Thanks a lot, and that's why I suggested to review this patch when the memory is fresh, since there was a long discussion about this patch in the series. > > Also, in that thread, I found (as I was recalling there being) a couple of > open > questions, one even pointing to the possibility of adopting a different > design for > this part of the code, which I am not sure could have been considered a closed > matter. > > In any case, it would have been nice, given the situation, if you'd have put > a few > words about, e.g., which solution you ended up implementing and why, either in > the cover or here (e.g., in the '---' area). Thanks for the suggestion. As I mentioned before, updating the PI descriptor needs to be atomic, I think it should be a little expensive. So doing it every VM-exit seems not a good idea to me. > > From the design point of view, I said during v7 that I don't dislike having > some of > the things that this feature requires dpme in (VMX specific part of) the > context > switch path, and that is still valid. > > What I really don't like much is this blocking cancellation hook you have > introduced. > > I mean... > > On Mon, 2015-10-12 at 16:55 +0800, Feng Wu wrote: > > - Add the following hooks, this part was suggested > > by George Dunlap <george.dunlap@xxxxxxxxxxxxx> and > > Dario Faggioli <dario.faggioli@xxxxxxxxxx>. > > * arch_vcpu_block() > > Called alled before vcpu is blocking and update the PID > > (posted-interrupt descriptor). > > > > * arch_vcpu_block_cancel() > > Called when interrupts come in during blocking. > > > ... This one. > > Reason is, hooks are not, IMO, among the nicest things. You have to > remember to call them, you have to put the call to them in the proper > place, etc., when writing the code. OTOH, when reading the code, they > break the flow and force one to go and figure out what happens in > potentially not so related areas. In summary, they're hard to get > right. :-/ > > That being said, I can live with this, but I wonder whether we really > can't do without. For instance, Jan said in the v7 thread: > > "Couldn't this be taken care of by, if necessary, adjusting PI state > in vmx_do_resume()?" > > This is actually what started the sub-sub-thread about the alternative > design of doing everything during VMENTERs/VMEXITs. If you are > unconvinced about going that path all the way, would at least do the > fixup in there (i.e., taking care of the case where we called > arch_vcpu_block() but then we did not block) work and make sense? > > Actually, I think even another possible implementation variant that was > suggested at some point (by George, in this case, for other reasons and > purposes) could make this adding this hook unnecessary, i.e.: > > "vcpu_block() > set(_VPF_blocked) > local_events_need_delivery() > hvm_vcpu_has_pending_irq() > ... > context_switch > v->arch.block() > - Add v to pcpu.pi_blocked_vcpu > - NV => pi_wakeup_vector > > If we do it [this] way, and an interrupt comes in before the context > switch is finished, it will call posted_intr_vector. We can, at that > point, check to see if the current vcpu is marked as blocked. If it > is, we can call vcpu_unblock() without having to modify NV or worry > about adding / removing the vcpu from the pi_blocked_vcpu list." This is something similar with patch v7 and before, doing vcpu block during context switch, and seems during the discussion, you guys prefer doing the vcpu blocking things outside context switch. > > At the time, I "voted against" this design, because it seemed we could > manage to handle interrupt ('regular' and posted) happening during > blocking in one and unified way, and with _only_ arch_vcpu_block(). If > that is no longer the case (and it's not, as we're adding more hooks, > and the need to call the second is a special case being introduced by > PI), it may be worth reconsidering things... > > So, all in all, I don't know. As said, I don't like this cancellation > hook because it's one more hook and because --while I see why it's > useful in this specific case-- I don't like having it in generic code > (in schedule.c), and even less having it called in two places > (vcpu_block() and do_pool()). However, if others (Jan and George, I > guess) are not equally concerned about it, I can live with it. > > Thoughts? If I understand it correctly, this block cancel method was suggested by George, please refer to the attached email. George, what is your opinion about it? It is better to discuss a clear solution before I continue to post another version. Thanks a lot! > > > * vmx_pi_switch_from() > > Called before context switch, we update the PID when the > > vCPU is preempted or going to sleep. > > > > * vmx_pi_switch_to() > > Called after context switch, we update the PID when the vCPU > > is going to run. > > > > * arch_vcpu_wake_prepare() > > It will be called when waking up the vCPU, we update > > the posted interrupt descriptor when the vCPU is > > unblocked. > > > The rest of the patch seems fine to me (at least the scheduling related > implications). > > Just a few (pretty minor) comments. > > > --- a/xen/arch/x86/domain.c > > +++ b/xen/arch/x86/domain.c > > @@ -1608,6 +1608,18 @@ void context_switch(struct vcpu *prev, struct > > vcpu *next) > > if ( (per_cpu(curr_vcpu, cpu) == next) || > > (is_idle_domain(nextd) && cpu_online(cpu)) ) > > { > > + /* > > + * When we handle the lazy context switch for the following > > + * two scenarios: > > + * - Preempted by a tasklet, which uses in an idle context > > + * - the prev vcpu is in offline and no new available vcpus > > in run queue > > + * We don't change the 'SN' bit in posted-interrupt > > descriptor, this > > + * may incur spurious PI notification events, but since PI > > notification > > + * event is only sent when 'ON' is clear, and once the PI > > notificatoin > > + * is sent, ON is set by hardware, so not so many spurious > > events and > > + * it is not a big deal. > > + */ > > + > > local_irq_enable(); > > } > > > This comment: can't it leave somewhere else, more VMX and/or PI > related? > > I know this is arch code already but, still, if I'm here, I am reading > and trying to understand how context switch works, potentially, not > being interested in PI at all... And yet I find this doc comment, > talking about some SN and ON bits, without even defining what they are > and what they mean. :-/ > > Really, I'm not saying we shouldn't have it. On the contrary, it has > some valuable content in it. Can we just find another place where to > put it? > > Also, about the content. The last part, when it talks about spurious > interrupts, it says they're not a problem because we won't get that > many. I think that someone not very familiar with this things could use > being also told that it is ok/safe to get them (i.e., they don't get > lost, etc.). There's an email from George that explain this quite well. > I'd also be ok with this particular thing going in the patch changelog, > rather than in a comment, as far as it is somewhere. Okay, I will put the comments in the patch changelog. > > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > > index 3eefed7..383fd62 100644 > > --- a/xen/common/schedule.c > > +++ b/xen/common/schedule.c > > > @@ -800,10 +802,13 @@ void vcpu_block(void) > > > > set_bit(_VPF_blocked, &v->pause_flags); > > > > + arch_vcpu_block(v); > > + > > > This is maybe not so big of a deal but, since we call this pretty early > in the blocking path, and _especially_ if we are to keep the > cancellation hook, we may want to consider arch_vcpu_block_prepare() > (as we did for wake). Sure! Thanks, Feng > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

--- Begin Message ---

To: "Wu, Feng" <feng.wu@xxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxx>
Date: Tue, 22 Sep 2015 10:26:41 +0000
Cc: Jan Beulich <jbeulich@xxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Thread-index: AQHQ9SFNo6hohlTVFEKo9a5l6ybSkA==
Thread-topic: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

On 09/22/2015 08:19 AM, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
>> Sent: Monday, September 21, 2015 10:25 PM
>> To: Wu, Feng; George Dunlap; George Dunlap
>> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
>> xen-devel@xxxxxxxxxxxxx
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On Mon, 2015-09-21 at 12:22 +0000, Wu, Feng wrote:
>>>
>>>> -----Original Message-----
>>>> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
>>
>>>> You also need to check that local_events_need_delivery() will
>>>> return
>>>> "true" if you get an interrupt between that time and entering the
>>>> hypervisor.  Will that happen automatically from
>>>> hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
>>>> vlapic_has_pending_irq()?  Or will you need to add a hook in
>>>> hvm_vcpu_has_pending_irq()?
>>>
>>> I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what
>>> I need
>>> to do in vcpu_block() and do_poll() is as below:
>>>
>>> 1. set_bit(_VPF_blocked, &v->pause_flags);
>>>
>>> 2. ret = v->arch.arch_block(), in this hook, we can re-use the same
>>> logic in
>>> vmx_pre_ctx_switch_pi(), and check whether ON bit is set during
>>> updating
>>> posted-interrupt descriptor, can return 1 when ON is set
>>>
>> It think it would be ok for an hook to return a value (maybe, if doing
>> that, let's pick variable names and use comments to explain what goes
>> on as good as we can).
>>
>> I think I also see why you seem to prefer doing it that way, rather
>> than hacking local_events_need_delivery(), but can you please elaborate
>> on that? (My feeling is that you want to avoid having to update the
>> data structures in between _VPF_blocked and the check
>> local_events_need_delivery(), and then having to roll such update back
>> if local_events_need_delivery() ends up being false, is that the
>> case?).
>
> In the arch_block() hook, we actually need to
>       - Put vCPU to the blocking list
>       - Set the NV to wakeup vector
>       - Set NDST to the right pCPU
>       - Clear SN

Nit: We shouldn't need to actually clear SN here; SN should already be
clear because the vcpu should be currently running.  (I don't think
there's a way for a vcpu to go from runnable->blocked, is there?)  And
if it's just been running, then NDST should also already be the correct
pcpu.

> During we are updating the posted-interrupt descriptor, the VT-d
> hardware can issue notification event and hence ON is set. If that is the
> case, it is straightforward to return directly, and it doesn't make sense
> we continue to do the above operations since we don't need to actually.

But checking to see if any interrupts have come in in the middle of your
code just adds unnecessary complication.  We need to have the code in
place to un-do all the blocking steps in the case that
local_events_need_delivery() returns true anyway.

Additionally, while local_events_need_delivery() is only called from
do_block and do_poll, hvm_local_events_need_delivery() is called from a
number of other places, as is hvm_vcpu_has_pending_irq().  All those
places presumably also need to know whether there's a PI pending to work
properly.

>> Code would look better, IMO, if we manage to fold that somehow inside
>> local_events_need_delivery(), but that:
>
> As mentioned above, during updating the PI descriptor for blocking, we
> need to check whether ON is set, and return if it is set. This logic cannot
> be included in local_events_need_delivery(), since the update itself
> has not much relationship with local_events_need_delivery().
>
> However, I do find some issues with my proposal above, see below:
>
> 1. Set _VPF_blocked
> 2. ret = arch_block()
> 3. if ( ret || local_events_need_delivery() )
>       clear_bit(_VPF_blocked, &v->pause_flags);
>
> After step #2, if ret == false, that means, we successfully changed the PI
> descriptor in arch_block(), if local_events_need_delivery() returns true,
> _VPF_blocked is cleared. After that, external interrupt come in, hence
> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
>
> One possible solution for it is:
> - Disable the interrupts before the check in step #3 above
> - if local_events_need_delivery() returns true, undo all the operations
>  done in arch_block()
> - Enable interrupts after _VPF_blocked gets cleared.

Well, yes, if as a result of checking to see that there are interrupts
you unblock a processor, then you need to do *all* the things related to
unblocking, including switching the interrupt vector and removing it
from the blocked list.

If we're going to put hooks here in do_block(), then the best thing to
do is as much as possible to make PI interrupts act exactly the same as
other interrupts; i.e.,

* Do a full transition to blocked (set _VPF_blocked, add vcpu to PI
list, switch vector to wake)
* Check to see if there are any pending interrupts (either event
channels, virtual interrupts, or PIs)
* If so, do a full transition to unblocked (clear _VPF_blocked, switch
vector to PI, remove vcpu from list).

We should be able to order the operations such that if interrupts come
in the middle nothing bad happens, without needing to actually disable
interrupts.

OTOH -- I think if we grab a lock during an interrupt, we're not allowed
to grab it with interrupts disabled, correct?  So we may end up having
to disable interrupts anyway.

 -George

--- End Message ---

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.