[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4] xen: sched: convert RTDS from time to event driven model
On 2/5/2016 9:39 AM, Dario Faggioli wrote: On Wed, 2016-02-03 at 21:23 -0500, Tianyang Chen wrote:On 2/3/2016 7:39 AM, Dario Faggioli wrote:On Tue, 2016-02-02 at 13:09 -0500, Tianyang Chen wrote:So what do we do, we raise the *_delayed_runq_add flag and continue and leave the vcpu alone. This happens, for instance, when there is a race between scheduling out a vcpu that is going to sleep, and the vcpu in question being woken back up already! It's not frequent, but we need to take care of that. At some point (sooner rather tan later, most likely) the context switch will actually happen, and the vcpu that is waking up is put in the runqueue.Yeah I agree with this situation. But I meant to say when a vcpu is waking up while it's still on a queue. See below.So, if a vcpu wakes up while it is running on a pcpu already, or while it is already woken and has already been put in the runqueue, these are SPURIOUS WAKEUP, which should be dealt with by just doing nothing, as every scheduler is doing.... This is what I'm tring to say since a couple of email, let's see if I've said it clear enough this time. :-D I see. So I'm just curious what can cause spurious wakeup? Does it only happen to a running vcpu (currently on pcpu), and a vcpu that is on either runq or depletedq? static void rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu * const svc = rt_vcpu(vc); s_time_t now = NOW(); struct rt_private *prv = rt_priv(ops); BUG_ON( is_idle_vcpu(vc) ); if ( unlikely(curr_on_cpu(vc->processor) == vc) ) { SCHED_STAT_CRANK(vcpu_wake_running); return; }For this situation, a vcpu is waking up on a pcpu. It could be taken off from the replq inside the timer handler at the end. So, if we decide to not replenish any sleeping/un-runnable vcpus any further, we should probably add it back to replq here right?I'm sorry, I don't understand what you're saying here. The vcpu woke up a few time ago. We went through here. At the time it was not on any pcpu and it was not in the runqueue, so we did plan a replenishment event for it in the replenishment queue, at it's next deadline. Now we're here again, for some reason. We figure out that the vcpu is running already and, most likely (feel free to add an ASSERT() inside the if to that effect), it will also still have its replenishment event queued. So we realized that this is just a spurious wakeup, and get back to what we were doing. What's wrong with this I just said? The reason why I wanted to add a vcpu back to replq is because it could be taken off from the timer handler. Now, because of the spurious wakeup, I think the timer shouldn't take any vcpus off of replq, in which case wake() should be doing nothing just like other schedulers when it's a spurious wakeup. Also, only sleep() should remove events from replq. /* on RunQ/DepletedQ, just update info is ok */ if ( unlikely(__vcpu_on_q(svc)) ) { SCHED_STAT_CRANK(vcpu_wake_onrunq); return; }For this situation, a vcpu could be going through the following phases: on runq --> sleep --> (for a long time) --> wake upWell, but then we're not here, because when it went to sleep, it's been taken off the runqueue, hasn't it? Actually, I do think that you need to be running to go to sleep, but anyway, even if a vcpu could go to sleep or block from the runqueue, it's not going to stay in the runqueue, and the first wakeup that occurs does not find it on the runqueue, and hence is not covered by this case... Is it? uh, I missed the __q_remove() from sleep. It will not end up here in the case I mentioned. When it wakes up, it could stay in the front of the runq because cur_deadline hasn't been updated. It will, inherently have some high priority-ness (because of EDF) and be picked over other vcpus right? Do we want to update it's deadline and budget here and re-insert it accordingly?When it wakes up and it is found not to be either running or in the runqueue already, yes, we certainly want to do that! But if some other (again, spurious) wakeup occurs, and find it already in the runqueue (i.e., this case) we actually *don't* want to update its deadline again. Agree. if ( likely(vcpu_runnable(vc)) ) SCHED_STAT_CRANK(vcpu_wake_runnable); else SCHED_STAT_CRANK(vcpu_wake_not_runnable); if ( now >= svc->cur_deadline) { rt_update_deadline(now, svc); __replq_remove(svc); } __replq_insert(svc);For here, consider this scenario: on pcpu (originally on replq) --> sleep --> context_saved() but still asleep --> wake() Timer handler isn't triggered before it awakes. Then this vcpu will be added to replq again while it's already there. I'm not 100% positive if this situation is possible though. But judging from rt_context_saved(), it checks if a vcpu is runnable before adding back to the runq.I'm also having a bit of an hard time understanding this... especially what context_saved() has to do with it. I guess I shouldn't have put /* */ around the removal of a vcpu on replq in sleep(). So the original patch won't actually take a vcpu off if it goes to sleep. Like Meng said, it should. And I see what you mean by spurious wakeup now. I will put a summary for wake() somewhere below. Are you saying that it is possible for a vcpu to start running with deadline td, and hence a replenishment is programmed to happen at td, and then to go to sleep at ta<td and wakeup at tb that is also <td? Well, in that case, yes, whether to update the deadline or not, and whether to reuse the already existing replenishment or not, it's up to you guys, as it's part of the algorithm. What does the algorithm say in this case? In this implementation, you are removing replenishment when a vcpu sleep, so it looks to me that you don't have much choice than re- instating it at wakeup (we've gone through this already, remember?). OTOH, if you wouldn't stop the timer on sleep, then yes, we ma need some logic here, to understand whether the replenishment happened during sleeping/blocking time, or if it's still pending. If that was not what you were saying... then, sorry, but I just did not get it...Thoughts?So just to wrap up my thoughts for budget replenishment inside of wake(), a vcpu can be in different states(on pcpu, on queue etc.) when it wakes up.Yes, and only one is usually interesting: when it's not in any runqueue nor on any pcpu. In out case, the case where it's in _delayed_runq add needs some care as well (unlike, e.g., in Credit2). See below. 1)wake up on a pcpu: Do nothing as it may have some remaining budget and we just ignore it's cur_deadline when it wakes up. I can't find a corresponding pattern in DS but this can be treated like a server being pushed back because of the nap.Well, I'd say: do nothing because it's a spurious wakeup. Yes. 2)wake up on a queue: Update the deadline(refill budget) of it and re-insert it into runq for the reason stated above.Do nothing because it's a spurious wakeup. Yes. 3)wake up in the middle of context switching: Update the deadline(refill budget) of it so it can be re-inserted into runq in rt_context_saved().Ok.4)wake up on nowhere ( sleep on pcpu--> context_saved() --> wake()): Replenish it and re-insert it back to runq.Here it is the one we care about. And yes, I agree on what we should do in this case.As for replenishment queue manipulation, we should always try and add it back to replq because we don't know if the timer handler has taken it off or not.Mmm... no, I think we should know/figure out whether a replenishment is pending already and we are ok with it, or if we need a new one. Just to make sure, spurious sleep/wakeup are for a vcpu that is on pcpu or any queue right? If a real sleep happens, it should be taken off from replq. However, in spurious wakeup (which I assume corresponds to a spurious sleep?), it shouldn't be taken off from replq. Adding back to replq should happen for those vcpus which were taken off because of "real sleep". So here is a summary to make sure everyone's on the same page :)This is basically your code with a slight modification before replq_insert(). static void rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu * const svc = rt_vcpu(vc); s_time_t now = NOW(); BUG_ON( is_idle_vcpu(vc) ); if ( unlikely(curr_on_cpu(vc->processor) == vc) ) { /* * spurious wakeup so do nothing as it is * not removed from replq */ SCHED_STAT_CRANK(vcpu_wake_running); return; } /* on RunQ/DepletedQ, just update info is ok */ if ( unlikely(__vcpu_on_q(svc)) ) { /* * spurious wakeup so do nothing as it is * not removed from replq */ SCHED_STAT_CRANK(vcpu_wake_onrunq); return; } if ( likely(vcpu_runnable(vc)) ) SCHED_STAT_CRANK(vcpu_wake_runnable); else SCHED_STAT_CRANK(vcpu_wake_not_runnable); /* * sleep() might have taken it off from replq before * conext_saved(). Three cases: * 1) sleep on pcpu--> context_saved() --> wake() * 2) sleep on pcpu--> wake() --> context_saved() * 3) sleep before context_saved() -->wake(). * The first two cases sleep() doesn't remove this vcpu * so add a check before inserting. */ if ( now >= svc->cur_deadline) { rt_update_deadline(now, svc); __replq_remove(svc); } if( !__vcpu_on_replq(svc) ) __replq_insert(svc); /* If context hasn't been saved for this vcpu yet, we can't put it on * the Runqueue/DepletedQ. Instead, we set a flag so that it will be * put on the Runqueue/DepletedQ after the context has been saved. */ if ( unlikely(svc->flags & RTDS_scheduled) ) { set_bit(__RTDS_delayed_runq_add, &svc->flags); return; } /* insert svc to runq/depletedq because svc is not in queue now */ __runq_insert(ops, svc); runq_tickle(ops, snext); return; } static void rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu * const svc = rt_vcpu(vc); BUG_ON( is_idle_vcpu(vc) ); SCHED_STAT_CRANK(vcpu_sleep); /* * If a vcpu is not sleeping on a pcpu or a queue, * it should be taken off * from the replenishment queue * Case 1: sleep on a pcpu * Case 2: sleep on a queue * Case 3: sleep before context switch is finished */ if ( curr_on_cpu(vc->processor) == vc ) cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); else if ( __vcpu_on_q(svc) ) __q_remove(svc); else if ( svc->flags & RTDS_delayed_runq_add ) { clear_bit(__RTDS_delayed_runq_add, &svc->flags); if( __vcpu_on_replq(svc) ) __replq_remove(svc); } } Oh I might be misunderstanding what should be put in rt_schedule(). Was it specifically for handling a vcpu that shouldn't be taken off a core? Why are you referring this piece of code as budget enforcement? Isn't it necessary to modify the runq here after budget replenishment has happened? If runq goes out of order here, will rt_schedule() be sorting it?Ops, yes, you are right, this is not budget enforcement. So, budget enforcement looks still to be missing from this patch (and that is why I asked whether it is actually working). And this chunk of code (and perhaps this whole function) is probably ok, it's just a bit hard to understand what it does...So, the burn_budget() is still there right? I'm not sure what you are referring to as budget enforcement. Correct me if I'm wrong, budget enforcement means that each vcpu uses its assigned/remaining budget. Is there any specific situation I'm missing?That's correct, and I saw that burn_budget() is still there. What I failed to see was logic, within rt_schedule() for stopping a vcpu that has exhausted its budget until its next replenishment. Before, this was done in __repl_update(), which is (and that's a good thing) no longer there. But maybe it's me that missed something... __repl_update() only scans runq and depletedq for replenishment right? And right after burn_budget(), if the current vcpu runs out of budget, snext will be picked here because of: /* if scurr has higher priority and budget, still pick scurr */ if ( !is_idle_vcpu(current) && vcpu_runnable(current) && scurr->cur_budget > 0 && ( is_idle_vcpu(snext->vcpu) || scurr->cur_deadline <= snext->cur_deadline ) ) snext = scurr; if scurr->cur_budget == 0, snext will be picked immediately. Is that what you looking for? So, do you mind if we take a step back again, and analyze the situation. When the timer fires, and this handler is executed and a replenishment event taken off the replenishment queue, the following situations are possible: 1) the replenishment is for a running vcpu 2) the replenishment is for a runnable vcpu in the runq 3) the replenishment is for a runnable vcpu in the depletedq 4) the replenishment is for a sleeping vcpu So, 4) is never going to happen, right?, as we're stopping the timer when a vcpu blocks. If we go for this, let's put an ASSERT() and a comment as a guard for that. If we do the optimization in this very patch. If we leave it for another patch, we need to handle this case, I guess. In all 1), 2) and 3) cases, we need to remove the replenishment event from the replenishment queue, so just (in the code) do it upfront.I don't think we need to take them of the replenishment queue? See the last couple paragraphs.I meant removing it for reinserting it again with a different replenishment time, due to the fact that, as a consequence of the replenishment that is just happening, the vcpu is being given a new deadline. Ok so we are talking about the same thing. What other actions need to happen in each case, 1), 2) and 3)? Can you summarize then in here, so we can decide how to handle each situation?Here we go. 1) the replenishment is for a running vcpu Replenish the vcpu. Keep it on the replenishment queue. Tickle it as it may have a later deadline.What do you mean keep it in the repl queue? This particular replenishment just happened, which means the time at which it was scheduler passed, there is no reason for keeping it in the queue like this. Do you mean inserting a new replenishment event in the queue, for this same vcpu, with a new replenishment time corresponding to its deadline? If yes, then yes, I agree with that. Exactly. It's the way I expressed it. When I said keep it on, it includes remove() + insert(). lol 2) the replenishment is for a runnable vcpu in the runq Replenish the vcpu. Keep it on the replenishment queue.Same as above, I guess? Yes. (re-)insert it into runq. Tickle it.Ok.3) the replenishment is for a runnable vcpu in the depletedq Same as 2)Same as 2), but you insert it in the runq (not again in the depletedq), right? Right. _q_insert() checks if a vcpu has budget. If so it's put on the runq. Otherwise on the depletedq. 4) the replenishment is for a sleeping vcpu Take it off from the replenishment queue and don't do replenishment. As a matter of fact, this patch does the replenishment first and then take it off.Ah, so can this happen? Aren't you removing the queued replenishment event when a vcpu goes to sleep? Now, if we all agree on removing a vcpu in sleep(), this is not going to happen. So, there are three functions manipulating the replq now. I'd say that, in both case 2) and case 3), the vcpu needs to be taken out of the queue where they are, and (re-)inserted in the runqueue, and then we need to tickle... Is that correct?correct^.Ok. :-)I'd also say that, in case 1), we also need to tickle because the deadline may now be have become bigger than some other vcpu that is in the runqueue?correct. I missed this case^.Ok. :-) :-)Also, under what conditions we need, as soon as replenishment for vcpu X happened, to queue another replenishment for the same vcpu, at its next deadline? Always, I would say (unless the vcpu is sleeping, because of the optimization you introduced)?So this is kinda confusing. If in case 1) 2) and 3) they are never taken off from the replenishment queue, a series of replenishment events are automatically updated after the replenishment.This is mostly about the both of us saying the same thing differently. I'm saying that you should remove the vcpu from the replenishment runqueue and then insert it in there again with an updated replenishment time, computed out of the vcpu's new deadline. You are saying that the replenishment events are updated. Are we (talking about the same thing)? :-) Yeah. Great. I will put together v5 soon. If we all agree on the rt_wake() and rt_sleep().There is no need, as least one less reason, to keep track of when to queue the next vcpus for replenishment.I don't understand what you mean here.The side effect of this is that all vcpus could be going to sleep right after the timer was programmed at the end of the handler. If they don't wake up before the timer fires next time, the handler is called to replenish nobody. If that's the case, timer will not be programmed again in the handler. Wake() kicks in and programs it instead. And the downside of this is that we always need to check if an awaking vcpu is on replq or not. It not add it back(hence the notion of starting to track it's replenishment events again). When adding back, check if there is a need to reprogram the timer.This all sounds fine to me. Thanks, Tianyang _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |