[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [Hackathon 2016] Scheduling session minutes

Hi everyone,

I still owe this list the minutes of the session we had at the
hackathon about scheduling.

It was a round table with status updates being given on ongoing
activities, as well as a few ideas for future work/improvements being
tossed, so nothing too structured, but here we go.

I did not write down who was there, so I tried to Cc everyone that I
mentioned, or that I think would be interested, sorry if I missed


It's some time that we say we want it to be no longer experimental and,
eventually, be default. Now we are close, but there are still gaps to
be closed, such as:
 - soft-affinity still missing [patches on list from a student, Dario
   working on refreshing them]
 - caps feature still missing [dario working on it]

--> ACTION for Dario and George to finish the code. XenServer people 
           will help when it will come with testing benchmarking


And it's scheduling implications, like enabling in-guest NUMA
scheduling optimization. Status is, most of it is there, but for it to
function on PV guests (which includes dom0) we need what will be the
followup of Andrew's work on CPUID (he had a session about such work at
the hackathon as well).

As soon as this will land, it would unblock what's called "backend
locality". It basically means, on large NUMA and IONUMA boxes, a more
intelligent placement of backend components (e.g., kernel threads and
processes in dom0), improving how efficiently we will exploit the
memory and IO bandwidth.

This was discussed at a previous hackathon (in Dublin), and still
hasn't happened. George still thinks this would be a good idea. Dario
agrees. Unclear whether there will be community/companies interest in
having it (which means helping make it happen, testing it, using it if
it works well, etc).

Citrix XenServer people said they think some colleague of them did some
preliminary measurements, and it did not look too promising, but hard
to tell, since pieces are not there yet.

SuSE people said the may be interested at least in giving it a try, as
they have hardware where this could be useful.

---> ACTION Dario to rehash the discussion when pieces are there


Some analysis has been done, recently, to see whether we have issues
due to Xen's and Linux's scheduler interacting badly (Dario and Oracle
people, independently, have seen something like that). For example what
if, as far as Linux knows, moving a task from (the vcpu running on) CPU
A to (the vcpu running on) CPU B is cheap at a certain point in time,
but then, when the move actually happens, the Xen scheduler has moved
the vcpu that was running on CPU A and the one running on CPU B on two
other far away CPUs?

Dario has done some benchmarks, playing with the flags that control the
load balancer in Linux. Result is that this inter-scheduler interaction
plays a role, but it is very difficult to generalize and figure out in
what direction to move (if at all), as results are very much workload

Matt said he'd be interested in seeing the results, and maybe help on
figuring things out.

Juergen has a patch that makes the guest topology completely flat, so
the Linux scheduler would not waste time doing its fancy
SMT/core/whatever load balancing logic... and we say it "wastes time"
because all this logic is based on both wrong and unreliable topology

Dario has run some benchmarks on these patches. Results show some
improvements, but not as much as expected. One possible reason is
guests were too small, and they were always fully loaded. What we want
is see the numbers in cases where the guests are not fully booked, so
that the guests' scheduler will have to chime in and make decisions,
and we expect them to be bad.

Matt expressed again his interest on this activity, and said that he
concurs to approaches that tries to make the guest scheduler 'dumber'.

---> ACTION Dario to repeat the benchmark in the new configuration
     ACTION Dario to post all the numbers, so others can see and think 
     about them


There was a question (Luis, IIRC) about whether we do any power aware
scheduling in Xen.

Dario noted that, in Credit1, we do, but very lightly (there's a
packing versus spreading the workload flag).

George pointed that Credit2 has been designed with that (among other
things) in mind, especially the load balancer. There is nothing about
it currently, but the Credit2 load balancer is all built upon a
function that computes the merit of doing a certain balancing
operation, and the logic for coming up with a result is pluggable by
design, and can hence accommodate load balancing considerations.

ARM people expressed interest in having bigLITTLE support in th Xen
scheduler at some point, and asked about a sensible way of using
cpupools for that.

Dario said this should be doable with some low-to-moderate amount of
work, and that it makes sense.

Juergen and George noted that it may be better (and easier) to exploit
vcpu affinity to do that.

---> ACTION xxx


At the previous XenProject developper meeting Co-scheduling (aka Gang
scheduling) was mentioned and deemed interested for being investigated.
Dario reported having done some searching and thinking, and it indeed
looks a nice feature to offer, but it's hard to figure out whether it
will be really useful and adopted, as it also introduces limitations.

Juergen noted that it would be interesting to try a very special form
of Gang-scheduling, such as when a vcpu of VM A is scheduled on core x,
and if core y is SMT sibling of x, prefer scheduling on y another vcpu
of VM A, rather than any other vcpu of any other VM.

This would improve precision of accounting, and reduce the scope for
side channels attacks exploiting the siblings' shared caches.

Dario found this idea really really interesting

---> ACTION Dario to look into this, but not immediately


George talked about how difficult it is to benchmark a scheduler. In
fact, you need high degree of CPU competition, a combination of
workloads to be run inside the VMs with load that varies over time but,
at the same time, is reproducible (because if you find a 'bug' you want
to reproduce it!)

Dario mentioned the difficulty of coming up with workload and
benchmarks that are representative of real users and customers use case
and scenario. We would need feedback from both users and companies'
customers for that.

Everyone agreed that it is very valuable but also very very hard to
obtain such kind of feedback. Citrix has a huge test farm for
XenServer, and they will see about running more scheduling related

Dario has a plan to add at least some "basic" form of performance
regression testing to OSSTest.

---> ACTION Dario to add performance regression testing to OSSTest

Thanks to everyone that attended.

If I got something wrong, inaccurate or if I missed anything, feel free
to point out :-)

<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.