Xen project Mailing List

Re: [Xen-devel] [PATCH v2] xen:rtds:Fix bug in budget accounting

On Wed, 2016-10-26 at 12:08 -0400, Meng Xu wrote: > On Wed, Oct 26, 2016 at 10:43 AM, Dario Faggioli > <dario.faggioli@xxxxxxxxxx> wrote: > > > > On Fri, 2016-10-21 at 22:11 -0400, Meng Xu wrote: > > > > > > Bug scenario: > > > While a VCPU is running on a core, it may misses its deadline. > > > > > May be useful to mention why (at least the most common reasons, > > like > > overhead and/or system being overloaded... are there others?). > > How about adding this: > For example, when the sum of the utilizations of all VCPUs that are > pinned onto a core is larger than 1. > Yep, that's what I mean with "overload" (Or "oversubscription", if you wish, or whatever). > As you know, the VCPU utilization is not the only factor that > determines the schedulability of gEDF scheduling. It will be hard to > give the exact situation when deadline may happen. > Well, but that's still overload or oversubscription. It just happens to be that overload occurs below 100% utilization! :-) Basically, I'm not asking you to write a summary of gEDF schedulability theory in this changelog... I'm just suggesting to mention that the cause and the reason why this budget overrun/deadline miss situation can happen is overload, and not, e.g., a bug in the code, or anything like that. > > Which is too much. Is this what we are talking about? > > The high-level idea you got is correct. It is what we are talking > about. :-) > But the parameters used in your example will not happen. :-) > Right, sorry. Let's forget the numbers, and discuss how to fix this. > > So (1), one thing that I would do is to set svc->last_start=now; in > > burn_budget(). > > > > However, just doing that won't solve the problem, because > > repl_timer_handler() does not call burn_budget(). I've wondered a > > bit > > whether that is correct or not... I mean, especially in the > > situation > > we are discussing --when repl_timer_handler() runs "before" > > rt_schedule()-- how do we account for the budget spent from the > > last > > invocation of burn_budget() and where in time we are now? Well, I'm > > not > > really sure whether or not this is a problem. > > I see. You want to account for the budget spent in the previous > period > when deadline happens. > > How about in the repl_timer_handler(), we check if the VCPU is > current > running. If yes, we will call burn_budget() for the VCPU and update > the svc->last_start = now. > No, sorry, why would we need to know if it's currently running? IMO, there's way more checks like that in sched_rt.c than how I'd like to, so I'm hesitant to adding more of them. > I don't really like updating the last_start in burn_budget(). > > last_start is the recent starting time of the VCPU. It is to record > the starting time of a currently running VCPU. So it should be > updated > only when the VCPU is scheduled. > Yes, the name suggest it is what you write. But how is it used? It's used to make sure that we burn the proper amount of budget. And it looks to me that, to serve this purpose, it needs to be updated to NOW() (or something equivalent to that), when we touch cur_budget, or we risk inconsistencies like, well, like the one we're discussing here! :-) OTOH, is there a reason for which we need to track the time at which a vcpu started to execute? If yes, I don't see that. So, I agree that last_start may look a bit misleading, but that's what it is... let's change its name if you think it's worth. Note, however, that apart from this case when the budget is actually updated within the timer handler, because of a deadline miss, the value contained in that variable would actually _really_ be the time at which it started executing, even if you update it in burn_budget() and in rt_update_deadline(). Perhaps, add a comment about this whole thing, in rt_update_deadline(). > When burn_budget() is called, the VCPU can be scheduled or > de-scheduled. When the VCPU is de-scheduled, we don't have to update > the last_start, since it will be updated in the next time when the > VCPU is scheduled. > > If we update last_start in burn_budget(), it won't cause incorrect > functionality. It just wastes time on one assignment operation. > Well, let's not bother about one assignment, ok? :-P When you deschedule a vcpu, whatever it is in last_start is inaccurate and useless anyway. That being said, until burn_budget() is called only from 1 place, it's also true that we don't need to bother updating last_start inside it. But as soon as we'll add a call to burn_budget() somewhere else, if we forgot to update last_start, we'll have problems. For now, I don't foresee us adding such calls, but doing things in the way that leaves the least possibilities to cause bugs is in general a good thing. :-) > > Or maybe it could be an issue, but only if either > > the overhead and/or the overload are so big (e.g., if the vcpu is > > overrunning and doing that for more than one full period), that the > > system is out of the window already. > > I'm sorry that I didn't quite get this part "doing that for more than > one full period". > Yeah, forget about it. The core of the question is that I really think the budget should be allowed to go negative, and I actually don't recall why we didn't do it that way. But, really, let's not deal with this here. > > So, let's not consider this, for the moment... And let's focus on > > the > > fact that, in rt_update_deadline(), we are changing cur_deadline > > and > > cur_budget, so we also need to update last_start (as we said above > > they're related!). I guess we either call burn_budget from > > rt_update_deadline(), or we open-code svc->last_start=now; > > Yep. If we call burn_budget in rt_update_deadline, we will need to > update the last_start in burn_budget. > Yeah, but, as I say below, I don't think that's necessary. What I think is necessary, is that we update cur_budget and last_start together. > > burn_budget() does things that we probably don't need and don't > > want to > > do in rt_update_deadline(). In fact, if we don't allow the budget > > to go > > negative, and ignore the (potential, and maybe non-existent) > > accounting > > issue described above, there's no point in marking the vcpu as > > depleted > > (in burn_budget()) and then (since we're calling it from > > rt_update_deadline()) replenishing it right away! :-O > > > > So (2), I guess the best thing to do is "just" to update last_start > > in > > rt_update_deadline() to, which is sort of what the patch is doing. > > Yes. This is the simplest and most neat way I can think of so far. > As you suggested, we can also call burn_budget in > rt_update_deadline() > and update last_start in burn_budget. > Both solutions are doing the same thing. However, calling burn_budget > in rt_update_deadline() is more expensive than just updating the > last_start in rt_update_deadline(). :-) > It's not only more expensive, it will also require more code adjustment, I think. In fact, there's the risk that burn budget will set __RTDS_depleted, and we'd need to make sure that, in the case at hand, it would get reset (either in rt_update_deadline() or in repl_timer_handler()). So, really, I'd say let's not do this. I mentioned it because _conceptually_ it's what we want, but practically, we'd better achieve the same in another way. > > > --- a/xen/common/sched_rt.c > > > +++ b/xen/common/sched_rt.c > > > @@ -407,6 +407,14 @@ rt_update_deadline(s_time_t now, struct > > > rt_vcpu > > > *svc) > > > svc->cur_deadline += count * svc->period; > > > } > > > > > > + /* > > > + * svc may be scheduled to run immediately after it misses > > > deadline > > > + * Then rt_update_deadline is called before rt_schedule, > > > which > > > + * should only deduct the time spent in current period from > > > the > > > budget > > > + */ > > > + if ( svc->last_start < (svc->cur_deadline - svc->period) ) > > > + svc->last_start = svc->cur_deadline - svc->period; > > > + > > - Do we need the if()? Isn't it safe to just always update > > last_start? > > If it is, as it looks to me to be, I'd prefer it done like that. > > Yes. I will ditch the if(). > Ok. > > - Why cur_deadline-period, and not NOW() (well, actually, now)? > > Well, in the scenario above, and in all the others I can think > > of, > > it would be almost the same. And yet, I'd prefer using now, for > > clarity and consistency. > > Hmm, it depends on how we account for the budget. > Suppose the VCPU keeps running. > the start of next period is t = 10 and the next period ends at t = > 20. > rt_update_deadline is invoked at t = 10.1. > Yes, I know... > If last_start = 10, which is cur_deadline - period, we will account > for the 0.1 time unit as the budget consumed by the VCPU. > If last_start = 10.1, which is now, we will not account for the 0.1 > time unit. > ...but by using 10, you're saying that you are sure that the vcpu has been running since 10? Because, if you do that, that's what you are charging the vcpu for. Note that this is a genuine question that I'm _deliberately_ dumping on you! :-P :-P > Since we always account for the time spent in hypervisor into the > budget of currently running VCPU, I chose last_start = cur_deadline - > period. > I'm ok for last_start = now, considering the consistency. So your > call. :-) > If the answer to the above question is 'yes', I guess I'm fine with that too. Actually, you can use cur_deadline, if you put the assignment _before_ updating the deadline (together with a comment about why we use that instead of NOW()). > > In summary, what I think we need is a patch that does both (1) and > > (2), > > i.e., it makes sure that we always update last_start to NOW() all > > the > > times that we assign to a vcpu its new budget. That would make the > > code > > more consistent and should also fix the buggy behavior you're > > seeing. > > > > What do you think? > > (1) I agree. > (2) I'm ok with either way. > So I will change the code and test it. > If it solves the problem, which it should, I will send out a path > that > has both (1) and (2) today. > Ok. Thanks, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.