[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Design RFC] Towards work-conserving RTDS scheduler



On Mon, Aug 8, 2016 at 5:38 AM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> On Thu, 2016-08-04 at 01:15 -0400, Meng Xu wrote:
>> Hi Dario,
>>
> Hi,
>
>> I'm thinking about changing the current RTDS scheduler to
>> work-conserving version as we briefly discussed before.
>> Below is a design of the work-conserving RTDS.
>> I'm hoping to get your feedback about the design ideas first before I
>> start writing it in code.
>>
> Here I am, sorry for the delay.
>
>> I think the code change should not be a lot as long as we don't
>> provide the functionality of switching between work-conserving and
>> non-work-conserving. Because the following design will keep the
>> real-time property of the current RTDS scheduler, I don't see the
>> reason why we should let users switch to non-work-conserving version.
>> :-)
>>
> Oh, but there's a bit one: _money_! :-O
>
> If you're a service/cloud provided you may or may not want that a
> customers that pays for a 40% utilization VM to be able to use more
> than that. In particular, you may want to ask more money to them, in
> order to enable that possibility! :-P

Good point.

>
> Anyway, I don't think --with this design of yours-- that it is such a
> big deal to make it possible to switch work-conserving*ness on and off
> (see below). Actually, I think it's even possible to to that on a per-
> vcpu basis, which I think would be quite cool!
>
>> --- Below is the design ---
>>
>> [...]
>>
>> *** Requirement of the work-conserving RTDS scheduler ***
>> 1) The new RTDS scheduler should be work-conserving, of course.
>> 2) The new RTDS scheduler should not break any real-time guarantee
>> provided by the current RTDS scheduler.
>>
>> *** Design of  Work-Conserving RTDS Scheduler ***
>> VCPU model
>> 1) (Period, Budget): Guaranteed <Budget> time for each <Period>
>> 2) Priority index: It indicates the current  priority level of the
>> VCPU. When a VCPU’s budget is depleted in the current period, its
>> priority index will increase by 1 and its budget will be replenished.
>> 3) A VCPU’s budget and priority index will be reset at the beginning
>> of each period
>>
> Ok, I think I see what you mean and it looks to make sense to me.
>
> Just one question/observation. As you know, I come from a CBS mindset.
> CBS postpones a task/vcpu's deadline when it runs out of budget, and it
> can, natively, work in work conserving or non-work conserving mode
> (just by wither continue to consider the vcpu runnable, with the later
> deadline which mean demoted priority, or block it until the next
> period, respectively).
>
> The nice thing about this is that the scheduling analysis that has been
> developed works for both modes. Of course, what it says is that you can
> only guarantee to each vcpu the reserved utilization, and you should
> not rely on the additional capacity that you may be getting because
> you're in work conserving mode and the system happened to be idle for a
> few time this or that other period (so, very similar to what you're
> proposing). _HOWEVER_, there are evolutions of CBS (called GRUB and
> SHRUB, I'm sure you'll be able to find the papers), where the 'unused
> bandwidth' (i.e., the otherwise idle time that you're making use of iff
> you're in work conserving mode) is distributed in a precise way
> (according to some weights, IIRC) to the various vcpus, hence making
> scheduling analysis both possible and useful again.

I agree that if we can have the scheduling analysis for the
work-conserving version, that will be great!
Actually, I thought about it (but not hard enough).
I will think more about the schedulability analysis and get back to
this thread later.

Yes. I'm aware of the GRUB and SHRUB papers

>
> Now, I'm not at all saying that we (you! :-D) should RTDS into using
> CBS(ish) or anything like that.

Thanks! :-)

> I'm just thinking out loud and
> wondering:
>  - could it be useful to have a scheduling analysis in place for the
>    scheduler in work conserving mode (one, of course, that takes into
>    account and give guarantees on the otherwise idle bandwidth... I
>    know that the existing one holds! :-P) ?
>  - if yes, do you already have one --or do you think it will be
>    possible to develop one-- for your priority-index based model?

I think I could potentially develop one such analysis.

>
> Note that I'm not saying you should, and I'd be perfectly fine with a
> "no analysis, but let's keep things simple for now"... This just came
> to my mind, and I'm just pointing it ouy, to make sure we consider and
> think about it, and make a conscious decision.

Sure! Totally agree!

>
>> Scheduling policy: modified gEDF
>> 1) Priority comparison:
>>    a) VCPUs with lower priority index has higher priority than VCPUs
>> with higher priority index
>>    b) VCPUs with same priority index uses gEDF policy to decide the
>> priority order
>> 2) Scheduling point
>>    a) VCPU’s budget is depleted for the current priority index
>>    b) VCPU starts a new period
>>    c) VCPU is blocked or waked up
>> 3) Scheduling decision is made when scheduler is invoked
>>     a) Always pick the current M highest-priority VCPUs to run on the
>> M cores.
>>
> So, still about the analysis point above, and just out of the top of my
> head (and without being used to do this things any longer!!), it looks
> like it's possible think at some analysis for this.
>
> In fact, since:
>  - vcpus with different priority indexes are totally disjoint sets,
>  - there's a strict ordering between priority indexes,
>  - vcpus sort of use their scheduling parameters at each priority index
>
> This looks to me like vcpus are subject to a "hierarchy" of RTDS
> schedulers, the one at level x+1 running in the idle time of the one at
> level x... And I think there's scope for writing down some maths
> formulas that model this situation. :-)
>
> Actually, it's quite likely that you either have already noticed this
> and done the analysis, or that someone else in literature has done
> something similar --maybe with other schedulers-- before.

Yes, I noticed this but I don't have analysis yet. ;-) I will do some
math formulas to model this situation.
I'm thinking the desired design will be
1) Work-conserving scheduler;
2) A *tight* schedulability analysis. If we cannot get tight analysis,
we should reduce the abstraction overhead, i.e., num_cores -
utilization of all VCPUs. (In order to achieve better analysis, we may
need to change the scheduling policy a bit. I'm not very clear about
how to do it yet, but I will think about it.)

>
> Anyway, the idea itself looks fair enough to me. I'd like to hear, if
> that's fine with you, how you plan to actually implement it, as there
> of course are multiple different ways to do it, and there are, IMO, a
> couple of things that should be kept in mind.

How about letting me think about the analysis first. If we can have
both the work-conserving algorithm and the analysis, that will be
better. If we finally decide not to have the analysis, we can fall
back to the discussion of the current design?

>
> Finally, about the work-conserving*ness on-off switch, what added
> difficulty or increase in code complexity prevents us to, instead of
> this:
>
> "2) Priority index: It indicates the current  priority level of the
>     VCPU. When a VCPU’s budget is depleted in the current period, its
>     priority index will increase by 1 and its budget will be
>     replenished."
>
> do something like this:
>
> "2) Priority index: It indicates the current  priority level of the
>     VCPU. When a VCPU's budget is depleted in the current period:
>      2a) if the VCPU has the work conserving flag set, its priority
>          index will be increased by 1, and its budget replenished;
>      2b) if the VCPU has the work conserving flag cleat, it's blocked
>          until next period."
>
> ?

Agree. We can have the per-VCPU working-conserving flag.


>
> Thanks and Regards,

Thank you very much for your valuable comments and suggestions! :-)

Best Regards,

Meng

-- 
------------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.