Xen project Mailing List

Re: [Xen-devel] [PATCH v4] xen: sched: convert RTDS from time to event driven model

To: Tianyang Chen <tiche@xxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Fri, 5 Feb 2016 15:39:42 +0100

Cc: george.dunlap@xxxxxxxxxx, Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>, Meng Xu <mengxu@xxxxxxxxxxxxx>

Delivery-date: Fri, 05 Feb 2016 14:39:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, 2016-02-03 at 21:23 -0500, Tianyang Chen wrote: > > On 2/3/2016 7:39 AM, Dario Faggioli wrote: > > On Tue, 2016-02-02 at 13:09 -0500, Tianyang Chen wrote: > > >Â > > So what do we do, we raise the *_delayed_runq_add flag and continue > > and > > leave the vcpu alone. This happens, for instance, when there is a > > race > > between scheduling out a vcpu that is going to sleep, and the vcpu > > in > > question being woken back up already! It's not frequent, but we > > need to > > take care of that. At some point (sooner rather tan later, most > > likely) > > the context switch will actually happen, and the vcpu that is > > waking up > > is put in the runqueue. > > > > Yeah I agree with this situation. But I meant to say when a vcpu isÂ > waking up while it's still on a queue. See below. > So, if a vcpu wakes up while it is running on a pcpu already, or while it is already woken and has already been put in the runqueue, these are SPURIOUS WAKEUP, which should be dealt with by just doing nothing, as every scheduler is doing.... This is what I'm tring to say since a couple of email, let's see if I've said it clear enough this time. :-D > > static void > > rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) > > { > > ÂÂÂÂÂstruct rt_vcpu * const svc = rt_vcpu(vc); > > ÂÂÂÂÂs_time_t now = NOW(); > > ÂÂÂÂÂstruct rt_private *prv = rt_priv(ops); > > > > ÂÂÂÂÂBUG_ON( is_idle_vcpu(vc) ); > > > > ÂÂÂÂÂif ( unlikely(curr_on_cpu(vc->processor) == vc) ) > > ÂÂÂÂÂ{ > > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_running); > > ÂÂÂÂÂÂÂÂÂreturn; > > ÂÂÂÂÂ} > > > > For this situation, a vcpu is waking up on a pcpu. It could be takenÂ > off from the replq inside the timer handler at the end. So, if we > decideÂ > to not replenish any sleeping/un-runnable vcpus any further, we > shouldÂ > probably add it back to replq here right? > I'm sorry, I don't understand what you're saying here. The vcpu woke up a few time ago. We went through here. At the time it was not on any pcpu and it was not in the runqueue, so we did plan a replenishment event for it in the replenishment queue, at it's next deadline. Now we're here again, for some reason. We figure out that the vcpu is running already and, most likely (feel free to add an ASSERT() inside the if to that effect), it will also still have its replenishment event queued. So we realized that this is just a spurious wakeup, and get back to what we were doing. What's wrong with this I just said? > > ÂÂÂÂÂ/* on RunQ/DepletedQ, just update info is ok */ > > ÂÂÂÂÂif ( unlikely(__vcpu_on_q(svc)) ) > > ÂÂÂÂÂ{ > > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_onrunq); > > ÂÂÂÂÂÂÂÂÂreturn; > > ÂÂÂÂÂ} > > > > For this situation, a vcpu could be going through the following > phases: > > on runq --> sleep --> (for a long time) --> wake up > Well, but then we're not here, because when it went to sleep, it's been taken off the runqueue, hasn't it? Actually, I do think that you need to be running to go to sleep, but anyway, even if a vcpu could go to sleep or block from the runqueue, it's not going to stay in the runqueue, and the first wakeup that occurs does not find it on the runqueue, and hence is not covered by this case... Is it? > When it wakes up, it could stay in the front of the runq becauseÂ > cur_deadline hasn't been updated. It will, inherently have some highÂ > priority-ness (because of EDF) and be picked over other vcpus right? > DoÂ > we want to update it's deadline and budget here and re-insert itÂ > accordingly? > When it wakes up and it is found not to be either running or in the runqueue already, yes, we certainly want to do that! But if some other (again, spurious) wakeup occurs, and find it already in the runqueue (i.e., this case) we actually *don't* want to update its deadline again. > > ÂÂÂÂÂif ( likely(vcpu_runnable(vc)) ) > > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_runnable); > > ÂÂÂÂÂelse > > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_not_runnable); > > > > > > ÂÂÂÂÂif ( now >= svc->cur_deadline) > > ÂÂÂÂÂ{ > > ÂÂÂÂÂÂÂÂÂrt_update_deadline(now, svc); > > ÂÂÂÂÂÂÂÂÂ__replq_remove(svc); > > ÂÂÂÂÂ} > > > > ÂÂÂÂÂ__replq_insert(svc); > > > > For here, consider this scenario: > > on pcpu (originally on replq) --> sleep --> context_saved() but > stillÂ > asleep --> wake() > > Timer handler isn't triggered before it awakes. Then this vcpu will > beÂ > added to replq again while it's already there. I'm not 100% positive > ifÂ > this situation is possible though. But judging from > rt_context_saved(),Â > it checks if a vcpu is runnable before adding back to the runq. > I'm also having a bit of an hard time understanding this... especially what context_saved() has to do with it. Are you saying that it is possible for a vcpu to start running with deadline td, and hence a replenishment is programmed to happen at td, and then to go to sleep at ta<td and wakeup at tb that is also <td? Well, in that case, yes, whether to update the deadline or not, and whether to reuse the already existing replenishment or not, it's up to you guys, as it's part of the algorithm. What does the algorithm say in this case? In this implementation, you are removing replenishment when a vcpu sleep, so it looks to me that you don't have much choice than re- instating it at wakeup (we've gone through this already, remember?). OTOH, if you wouldn't stop the timer on sleep, then yes, we ma need some logic here, to understand whether the replenishment happened during sleeping/blocking time, or if it's still pending. If that was not what you were saying... then, sorry, but I just did not get it... > > Thoughts? > > > > > So just to wrap up my thoughts for budget replenishment inside ofÂ > wake(), a vcpu can be in different states(on pcpu, on queue etc.) > whenÂ > it wakes up. > Yes, and only one is usually interesting: when it's not in any runqueue nor on any pcpu. In out case, the case where it's in _delayed_runq add needs some care as well (unlike, e.g., in Credit2). > 1)wake up on a pcpu: > Do nothing as it may have some remaining budget and we just ignore > it'sÂ > cur_deadline when it wakes up. I can't find a corresponding pattern > inÂ > DS but this can be treated like a server being pushed back because > ofÂ > the nap. > Well, I'd say: do nothing because it's a spurious wakeup. > 2)wake up on a queue: > Update the deadline(refill budget) of it and re-insert it into runq > forÂ > the reason stated above. > Do nothing because it's a spurious wakeup. > 3)wake up in the middle of context switching: > Update the deadline(refill budget) of it so it can be re-inserted > intoÂ > runq in rt_context_saved(). > Ok. > 4)wake up on nowhere ( sleep on pcpu--> context_saved() --> wake()): > Replenish it and re-insert it back to runq. > Here it is the one we care about. And yes, I agree on what we should do in this case. > As for replenishment queue manipulation, we should always try and add > itÂ > back to replq because we don't know if the timer handler has taken > itÂ > off or not. > Mmm... no, I think we should know/figure out whether a replenishment is pending already and we are ok with it, or if we need a new one. > > > Oh I might be misunderstanding what should be put in > > > rt_schedule(). > > > Was > > > it specifically for handling a vcpu that shouldn't be taken off a > > > core? > > > Why are you referring this piece of code as budget enforcement? > > > > > > Isn't it necessary to modify the runq here after budget > > > replenishment > > > has happened? If runq goes out of order here, will rt_schedule() > > > be > > > sorting it? > > > > > Ops, yes, you are right, this is not budget enforcement. So, budget > > enforcement looks still to be missing from this patch (and that is > > why > > I asked whether it is actually working). > > > > And this chunk of code (and perhaps this whole function) is > > probably > > ok, it's just a bit hard to understand what it does... > > > > So, the burn_budget() is still there right? I'm not sure what you > areÂ > referring to as budget enforcement. Correct me if I'm wrong, budgetÂ > enforcement means that each vcpu uses its assigned/remaining budget. > IsÂ > there any specific situation I'm missing? > That's correct, and I saw that burn_budget() is still there. What I failed to see was logic, within rt_schedule() for stopping a vcpu that has exhausted its budget until its next replenishment. Before, this was done in __repl_update(), which is (and that's a good thing) no longer there. But maybe it's me that missed something... > > So, do you mind if we take a step back again, and analyze the > > situation. When the timer fires, and this handler is executed and a > > replenishment event taken off the replenishment queue, the > > following > > situations are possible: > > > > Â 1) the replenishment is for a running vcpu > > Â 2) the replenishment is for a runnable vcpu in the runq > > Â 3) the replenishment is for a runnable vcpu in the depletedq > > Â 4) the replenishment is for a sleeping vcpu > > > > So, 4) is never going to happen, right?, as we're stopping the > > timer > > when a vcpu blocks. If we go for this, let's put an ASSERT() and a > > comment as a guard for that. If we do the optimization in this very > > patch. If we leave it for another patch, we need to handle this > > case, I > > guess. > > > > In all 1), 2) and 3) cases, we need to remove the replenishment > > event > > from the replenishment queue, so just (in the code) do it upfront. > > > > I don't think we need to take them of the replenishment queue? See > theÂ > last couple paragraphs. > I meant removing it for reinserting it again with a different replenishment time, due to the fact that, as a consequence of the replenishment that is just happening, the vcpu is being given a new deadline. > > What other actions need to happen in each case, 1), 2) and 3)? Can > > you > > summarize then in here, so we can decide how to handle each > > situation? > > > > Here we go. > 1) the replenishment is for a running vcpu > Replenish the vcpu. Keep it on the replenishment queue. Tickle it as > itÂ > may have a later deadline. > What do you mean keep it in the repl queue? This particular replenishment just happened, which means the time at which it was scheduler passed, there is no reason for keeping it in the queue like this. Do you mean inserting a new replenishment event in the queue, for this same vcpu, with a new replenishment time corresponding to its deadline? If yes, then yes, I agree with that. > 2) the replenishment is for a runnable vcpu in the runq > Replenish the vcpu. Keep it on the replenishment queue. > Same as above, I guess? > (re-)insert itÂ > into runq. Tickle it. > Ok. > 3) the replenishment is for a runnable vcpu in the depletedq > Same as 2) > Same as 2), but you insert it in the runq (not again in the depletedq), right? > 4) the replenishment is for a sleeping vcpu > Take it off from the replenishment queue and don't do replenishment. > AsÂ > a matter of fact, this patch does the replenishment first and then > takeÂ > it off. > Ah, so can this happen? Aren't you removing the queued replenishment event when a vcpu goes to sleep? > > I'd say that, in both case 2) and case 3), the vcpu needs to be > > taken > > out of the queue where they are, and (re-)inserted in the runqueue, > > and > > then we need to tickle... Is that correct? > > > > correct^. > Ok. :-) > > I'd also say that, in case 1), we also need to tickle because the > > deadline may now be have become bigger than some other vcpu that is > > in > > the runqueue? > > > > correct. I missed this case^. > Ok. :-) :-) > > Also, under what conditions we need, as soon as replenishment for > > vcpu > > X happened, to queue another replenishment for the same vcpu, at > > its > > next deadline? Always, I would say (unless the vcpu is sleeping, > > because of the optimization you introduced)? > > So this is kinda confusing. If in case 1) 2) and 3) they are never > takenÂ > off from the replenishment queue, a series of replenishment events > areÂ > automatically updated after the replenishment. > This is mostly about the both of us saying the same thing differently. I'm saying that you should remove the vcpu from the replenishment runqueue and then insert it in there again with an updated replenishment time, computed out of the vcpu's new deadline. You are saying that the replenishment events are updated. Are we (talking about the same thing)? :-) > There is no need, asÂ > least one less reason, to keep track of when to queue the next vcpus > forÂ > replenishment. > I don't understand what you mean here. > The side effect of this is that all vcpus could be going to sleep > rightÂ > after the timer was programmed at the end of the handler. If they > don'tÂ > wake up before the timer fires next time, the handler is called toÂ > replenish nobody. If that's the case, timer will not be programmed > againÂ > in the handler. Wake() kicks in and programs it instead. And theÂ > downside of this is that we always need to check if an awaking vcpu > isÂ > on replq or not. It not add it back(hence the notion of starting toÂ > track it's replenishment events again). When adding back, check if > thereÂ > is a need to reprogram the timer. > This all sounds fine to me. > I really like the idea of replenishment queue compared to running > queueÂ > in that the places that we need to manipulate the queue are only inÂ > wake() and repl_handler(). > Indeed. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.