[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Should we mark RTDS as supported feature from experimental feature?

On Tue, 2016-04-26 at 16:35 +0100, George Dunlap wrote:
> On 26/04/16 08:56, Dario Faggioli wrote:
> > 
> > On Mon, 2016-04-25 at 21:44 -0400, Meng Xu wrote:
> > > 
> > Actually, writing one for RTDS would be a rather interesting and
> > useful
> > thing to do, IMO! :-)
> I think it would be helpful to try to spell out what we think are the
> criteria for marking RTDS non-experimental.

>   Reading your e-mail, Dario,
> I might infer the following criteria:
Thanks for this :-)

> 1. New event-driven code spends most of a full release cycle in the
> tree
> being tested
> 2. Better tests in osstest (which ones?)
> 3. A feature doc
> 4. A work-conserving mode
> Is that about right?
I think it is.

> #3 definitely sounds like a good idea.  #1 is probably reasonable.
Good that we agree on this.

> I don't think #4 should be a blocker; we have plenty of work-
> conserving
> schedulers. :-)
I am not absolutely sure about this either.

We do have work conserving schedulers, and one can partition the system
in cpupools and assign each VM to the one that best suits its needs.

Yet, think at someone wanting to boot Xen with "sched=rtds". This may
be someone wanting to play with/try the RTDS scheduler, it could be our
OSSTest jobs (the one that wants to test RTDS), or it could be someone
with a small enough system that partitioning it with cpupools is not

What this 'someone' would get is a dom0 that only has
(4*nr_dom0_vcpus)% CPU capacity available. If (let's assume we are in
the small system case, which as a matter of fact is also the case of
some of our test jobs) dom0 has 2 vcpus, this means 8% CPU total
capacity. The rest 92% of the time, the CPUs will just stay idle.

Let's assume that our 'someone' tries doing a local migration (OSSTest
does that). Or connecting with SSH and/or copying some medium to large
files with rsync. What would happen (and in fact, this is what happens
to OSSTest, as far as I can tell) is that things will timeout all the
time, migrations, sessions and file transfers would be incredibly slow.

And therefore, our dear 'someone' would, IMO, just turn away and look
at something else. Or will email xen-devel reporting a bug about
migration being slow, or connections timing out on Xen.

Increasing the reservation for dom0 --maybe even by default-- would
certainly allow to mitigate this, but at the cose of having less
bandwidth available to be guaranteed for actual, guests which is
certainly non desirable.

Of course, the same exact scenario just described, applies even if the
system is fully booked by guest domains, but all or most of them are
idle. There will again be a lot of idle time, while a couple of domains
(in the example at hand, just dom0) struggle to get done all they're
asked to do in their 8%! :-O

A work-conserving mode, selectable together with the other scheduling
parameters (and maybe enabled by default for dom0, and with a dedicated
boot parameter to change/affect that) would, according to me, provide a
more than decent solution, in a very simple way. It's not the perfect
solution. Not even the best one, probably. There are more advanced
techniques (like adaptive reservations, and others) but they all come
at a high price in terms of development and maintainability effort.

So, yeah, I'm not entirely sure yet, but I think a work conserving mode
could be very useful and should be regarded as important...

> Regarding #2, did you have specific tests in mind?
Nothing too advanced. This is a special purpose scheduler and need to
be tested by people that actually need it, and on their workload.

Still, there's a test case stressing cpupools code that also involves
this scheduler that I've been working on-&-off for a while now, that I
think would be really useful (and in fact, I want to finish it).

I also want to add a test (not sure how yet: a new job, a phase of a
job, etc), that plays with the scheduling parameters of all schedulers
(weights for Credit 1 and 2, budget and period for this).

There's not much else that I can think of, but this would be already
quite a bit better than now.

Thanks and Regards,
<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.