[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.


  • To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • Date: Wed, 15 Apr 2009 16:07:00 +0100
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 15 Apr 2009 08:07:33 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=JyojyH46YotoVC8FYdry2ZIef8X3lDLMair4+pUYaiIWr6KoGVGtFY80ckHD66tvri lYV/hEHZgEuAWnC7VSU1taKMZn/UlKfpRAbV3fjxblJObRvfM9X01hZ9sMJlB4nM9H5H QkrYIh8xgpNi8FNjxs4ITEOmShipUMc5YUbD8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

2009/4/10 Tian, Kevin <kevin.tian@xxxxxxxxx>:
> How about VM number in total you'd like to support?

A rule-of-thumb number would be that we want to perform well at 4 VMs
per core, and wouldn't mind having a performance "cliff" past 8 per
core (not thread).  So for a 16-core system, that would be "good" for
64 VMs and "acceptable" up to 128 VMs.

> Do you mean that same elapsed time in above two scenarios will be
> translated into different credits?

Yes.  Ideally, we want to give "processing power" based on weight.
But the "processing power" of a thread whose sibling is idle is
significantly more than the "processing power" of a thread whose
sibling is running.  (Same thing possibly for cpu frequency scaling.)
So we'd want to arrange the credits such that VMs with equal weight
equal "processing power", not just equal "time on a logical cpu".

> Xen3.4 now supports "sched_smt_power_savings" (both boot option
> and touchable by xenpm) to change power/performance preference.
> It's simple implementation to simply reverse the span order from
> existing package->core->thread to thread->core->package. More
> fine-grained flexibility could be given in future if hierarchical scheduling
> concept could be more clearly constructed like domain scheduler
> in Linux.

I haven't looked at this code.  From your description here it sounds
like a sort of a simple hack to get the effect we want (either
spreading things out or pushing them together) -- is that correct?

My general feeling is that hacks are good short-term solutions, but
not long-term.  Things always get more complicated, and often have
unexpected side-effects.  I think since we're doing scheduler work,
it's worth it to try to see if we can actually solve the
power/performance problem.

> imo, weight is not strictly translated into the care for latency. any
> elaboration on that? I remembered that previously Nishiguchi-san
> gave idea to boost credit, and Disheng proposed static priority.
> Maybe you can make a summary to help people how latency would
> be exactly ensured in your proposal

All of this needs to be run through experiments.  So far, I've had
really good success with putting waking VMs in "boost" priority for
1ms if they still have credits.  (And unlike the credit scheduler, I
try to make sure that a VM rarely runs out of credits.)

> there should be some way to adjust or limit usage of 'reservation' when
> multiple vcpus both claim a desire which however sum up to some
> exceeding cpu's computing power or weaken your general
> 'weight-as-basic-unit' idea?

All "reservations" on the system must add up to less than the total
processing power of the system.  So a system with 2 cores can't have a
sum of reservations more than 200%.  Xen will check this when setting
the reservation and return an appropriate error message if necessary.

>>* We will also have an interface to the cpu-vs-electrical power.
>>
>>This is yet to be defined.  At the hypervisor level, it will probably
>>be a number representing the "badness" of powering up extra cpus /
>>cores.  At the tools level, there will probably be the option of
>>either specifying the number, or of using one of 2/3 pre-defined
>>values {power, balance, green/battery}.
>
> Not sure how that number will be defined. Maybe we can follow
> current way to just add individual name-based options matching
> its purpose (such as migration_cost and sched_smt_power_savings...)

At the scheduler level, I was thinking along the lines of
"core_power_up_cost".  This would be comparible to the cost of having
things waiting on the runqueue.  So (for example) if the cost was 0.1,
then when the load on the current processors reached 1.1, then it
would power up another core.  You could set it to 0.5 or 1.0 to save
more power (at the cost of some performance).  I think defining it
that way is the closest to what you really want: a way to define the
performance impact vs power consumption.

Obviously at the user interface level, we might have something more
manageable: e.g., {power, balance, green} => {0, 0.2, 0.8} or
something like that.

But as I said, the *goal* is to have a useful configurable interface;
the implementation will depend on what actually can be made to work in
practice.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.