[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.



>From: George Dunlap
>Sent: 2009年4月9日 23:59
>
>For servers, our target "sweet spot" for which we will optimize is a
>system with 2 sockets, 4 cores each socket, and SMT (16 logical cpus).
>Ideal performance is expected to be reached at about 80% total system
>cpu utilization; but the system should function reasonably well up to
>a utilization of 800% (e.g., a load of 8).

How is 80%/800% chosen here?

>
>For virtual desktop systems, we will have a large number of
>interactive VMs with a lot of shared memory.  Most of these will be
>single-vcpu, or at most 2 vcpus

How about VM number in total you'd like to support?
.
>
>* HT-aware.
>
>Running on a logical processor with an idle peer thread is not the
>same as running on a logical processor with a busy peer thread.  The
>scheduler needs to take this into account when deciding "fairness".

Do you mean that same elapsed time in above two scenarios will be
translated into different credits?

>
>* Power-aware.
>
>Using as many sockets / cores as possible can increase the total cache
>size avalable to VMs, and thus (in the absence of inter-VM sharing)
>increase total computing power; but by keeping multiple sockets and
>cores powered up, also increases the electrical power used by the
>system.  We want a configurable way to balance between maximizing
>processing power vs minimizing electrical power.

Xen3.4 now supports "sched_smt_power_savings" (both boot option 
and touchable by xenpm) to change power/performance preference.
It's simple implementation to simply reverse the span order from 
existing package->core->thread to thread->core->package. More 
fine-grained flexibility could be given in future if hierarchical scheduling 
concept could be more clearly constructed like domain scheduler
in Linux.

Another possible 'fairness' point affected by power management
could be to take freq scaling into consideration, since credit by far
is simply calculated by elapsed time while elapsed time with 
different frequency actually indicates different consumed cycles.

>
>3. Target interface:
>
>The target interface will be similar to credit1:
>
>* The basic unit is the VM "weight".  When competing for cpu
>resources, VMs will get a share of the resources proportional to their
>weight.  (e.g., two cpu-hog workloads with weights of 256 and 512 will
>get 33% and 67% of the cpu, respectively).

imo, weight is not strictly translated into the care for latency. any
elaboration on that? I remembered that previously Nishiguchi-san
gave idea to boost credit, and Disheng proposed static priority. 
Maybe you can make a summary to help people how latency would
be exactly ensured in your proposal

>
>* Additionally, we will be introducing a "reservation" or "floor".
>  (I'm open to name changes on this one.)  This will be a minimum
>  amount of cpu time that a VM can get if it wants it.

this is good idea.

>
>For example, one could give dom0 a "reservation" of 50%, but leave the
>weight at 256.  No matter how many other VMs run with a weight of 256,
>dom0 will be guaranteed to get 50% of one cpu if it wants it.

there should be some way to adjust or limit usage of 'reservation' when 
multiple vcpus both claim a desire which however sum up to some 
exceeding cpu's computing power or weaken your general
'weight-as-basic-unit' idea?


>
>* The "cap" functionality of credit1 will be retained.
>
>This is a maximum amount of cpu time that a VM can get: i.e., a VM
>with a cap of 50% will only get half of one cpu, even if the rest of
>the system is completely idle.
>
>* We will also have an interface to the cpu-vs-electrical power.
>
>This is yet to be defined.  At the hypervisor level, it will probably
>be a number representing the "badness" of powering up extra cpus /
>cores.  At the tools level, there will probably be the option of
>either specifying the number, or of using one of 2/3 pre-defined
>values {power, balance, green/battery}.

Not sure how that number will be defined. Maybe we can follow
current way to just add individual name-based options matching
its purpose (such as migration_cost and sched_smt_power_savings...)

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.