Xen project Mailing List

Re: [Xen-devel] [Design RFC] Towards work-conserving RTDS scheduler

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Tue, 9 Aug 2016 09:57:39 -0400

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 09 Aug 2016 13:57:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Aug 8, 2016 at 5:38 AM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > On Thu, 2016-08-04 at 01:15 -0400, Meng Xu wrote: >> Hi Dario, >> > Hi, > >> I'm thinking about changing the current RTDS scheduler to >> work-conserving version as we briefly discussed before. >> Below is a design of the work-conserving RTDS. >> I'm hoping to get your feedback about the design ideas first before I >> start writing it in code. >> > Here I am, sorry for the delay. > >> I think the code change should not be a lot as long as we don't >> provide the functionality of switching between work-conserving and >> non-work-conserving. Because the following design will keep the >> real-time property of the current RTDS scheduler, I don't see the >> reason why we should let users switch to non-work-conserving version. >> :-) >> > Oh, but there's a bit one: _money_! :-O > > If you're a service/cloud provided you may or may not want that a > customers that pays for a 40% utilization VM to be able to use more > than that. In particular, you may want to ask more money to them, in > order to enable that possibility! :-P Good point. > > Anyway, I don't think --with this design of yours-- that it is such a > big deal to make it possible to switch work-conserving*ness on and off > (see below). Actually, I think it's even possible to to that on a per- > vcpu basis, which I think would be quite cool! > >> --- Below is the design --- >> >> [...] >> >> *** Requirement of the work-conserving RTDS scheduler *** >> 1) The new RTDS scheduler should be work-conserving, of course. >> 2) The new RTDS scheduler should not break any real-time guarantee >> provided by the current RTDS scheduler. >> >> *** Design of Work-Conserving RTDS Scheduler *** >> VCPU model >> 1) (Period, Budget): Guaranteed <Budget> time for each <Period> >> 2) Priority index: It indicates the current priority level of the >> VCPU. When a VCPU’s budget is depleted in the current period, its >> priority index will increase by 1 and its budget will be replenished. >> 3) A VCPU’s budget and priority index will be reset at the beginning >> of each period >> > Ok, I think I see what you mean and it looks to make sense to me. > > Just one question/observation. As you know, I come from a CBS mindset. > CBS postpones a task/vcpu's deadline when it runs out of budget, and it > can, natively, work in work conserving or non-work conserving mode > (just by wither continue to consider the vcpu runnable, with the later > deadline which mean demoted priority, or block it until the next > period, respectively). > > The nice thing about this is that the scheduling analysis that has been > developed works for both modes. Of course, what it says is that you can > only guarantee to each vcpu the reserved utilization, and you should > not rely on the additional capacity that you may be getting because > you're in work conserving mode and the system happened to be idle for a > few time this or that other period (so, very similar to what you're > proposing). _HOWEVER_, there are evolutions of CBS (called GRUB and > SHRUB, I'm sure you'll be able to find the papers), where the 'unused > bandwidth' (i.e., the otherwise idle time that you're making use of iff > you're in work conserving mode) is distributed in a precise way > (according to some weights, IIRC) to the various vcpus, hence making > scheduling analysis both possible and useful again. I agree that if we can have the scheduling analysis for the work-conserving version, that will be great! Actually, I thought about it (but not hard enough). I will think more about the schedulability analysis and get back to this thread later. Yes. I'm aware of the GRUB and SHRUB papers > > Now, I'm not at all saying that we (you! :-D) should RTDS into using > CBS(ish) or anything like that. Thanks! :-) > I'm just thinking out loud and > wondering: > - could it be useful to have a scheduling analysis in place for the > scheduler in work conserving mode (one, of course, that takes into > account and give guarantees on the otherwise idle bandwidth... I > know that the existing one holds! :-P) ? > - if yes, do you already have one --or do you think it will be > possible to develop one-- for your priority-index based model? I think I could potentially develop one such analysis. > > Note that I'm not saying you should, and I'd be perfectly fine with a > "no analysis, but let's keep things simple for now"... This just came > to my mind, and I'm just pointing it ouy, to make sure we consider and > think about it, and make a conscious decision. Sure! Totally agree! > >> Scheduling policy: modified gEDF >> 1) Priority comparison: >> a) VCPUs with lower priority index has higher priority than VCPUs >> with higher priority index >> b) VCPUs with same priority index uses gEDF policy to decide the >> priority order >> 2) Scheduling point >> a) VCPU’s budget is depleted for the current priority index >> b) VCPU starts a new period >> c) VCPU is blocked or waked up >> 3) Scheduling decision is made when scheduler is invoked >> a) Always pick the current M highest-priority VCPUs to run on the >> M cores. >> > So, still about the analysis point above, and just out of the top of my > head (and without being used to do this things any longer!!), it looks > like it's possible think at some analysis for this. > > In fact, since: > - vcpus with different priority indexes are totally disjoint sets, > - there's a strict ordering between priority indexes, > - vcpus sort of use their scheduling parameters at each priority index > > This looks to me like vcpus are subject to a "hierarchy" of RTDS > schedulers, the one at level x+1 running in the idle time of the one at > level x... And I think there's scope for writing down some maths > formulas that model this situation. :-) > > Actually, it's quite likely that you either have already noticed this > and done the analysis, or that someone else in literature has done > something similar --maybe with other schedulers-- before. Yes, I noticed this but I don't have analysis yet. ;-) I will do some math formulas to model this situation. I'm thinking the desired design will be 1) Work-conserving scheduler; 2) A *tight* schedulability analysis. If we cannot get tight analysis, we should reduce the abstraction overhead, i.e., num_cores - utilization of all VCPUs. (In order to achieve better analysis, we may need to change the scheduling policy a bit. I'm not very clear about how to do it yet, but I will think about it.) > > Anyway, the idea itself looks fair enough to me. I'd like to hear, if > that's fine with you, how you plan to actually implement it, as there > of course are multiple different ways to do it, and there are, IMO, a > couple of things that should be kept in mind. How about letting me think about the analysis first. If we can have both the work-conserving algorithm and the analysis, that will be better. If we finally decide not to have the analysis, we can fall back to the discussion of the current design? > > Finally, about the work-conserving*ness on-off switch, what added > difficulty or increase in code complexity prevents us to, instead of > this: > > "2) Priority index: It indicates the current priority level of the > VCPU. When a VCPU’s budget is depleted in the current period, its > priority index will increase by 1 and its budget will be > replenished." > > do something like this: > > "2) Priority index: It indicates the current priority level of the > VCPU. When a VCPU's budget is depleted in the current period: > 2a) if the VCPU has the work conserving flag set, its priority > index will be increased by 1, and its budget replenished; > 2b) if the VCPU has the work conserving flag cleat, it's blocked > until next period." > > ? Agree. We can have the per-VCPU working-conserving flag. > > Thanks and Regards, Thank you very much for your valuable comments and suggestions! :-) Best Regards, Meng -- ------------ Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.