Xen project Mailing List

RE: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>

Date: Fri, 10 Apr 2009 08:15:36 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc:

Delivery-date: Thu, 09 Apr 2009 17:16:16 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: Acm5LB/afGvtNkVzS7mdrPwjhrU5lwAP9N4Q

Thread-topic: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.

>From: George Dunlap >Sent: 2009年4月9日 23:59 > >For servers, our target "sweet spot" for which we will optimize is a >system with 2 sockets, 4 cores each socket, and SMT (16 logical cpus). >Ideal performance is expected to be reached at about 80% total system >cpu utilization; but the system should function reasonably well up to >a utilization of 800% (e.g., a load of 8). How is 80%/800% chosen here? > >For virtual desktop systems, we will have a large number of >interactive VMs with a lot of shared memory. Most of these will be >single-vcpu, or at most 2 vcpus How about VM number in total you'd like to support? . > >* HT-aware. > >Running on a logical processor with an idle peer thread is not the >same as running on a logical processor with a busy peer thread. The >scheduler needs to take this into account when deciding "fairness". Do you mean that same elapsed time in above two scenarios will be translated into different credits? > >* Power-aware. > >Using as many sockets / cores as possible can increase the total cache >size avalable to VMs, and thus (in the absence of inter-VM sharing) >increase total computing power; but by keeping multiple sockets and >cores powered up, also increases the electrical power used by the >system. We want a configurable way to balance between maximizing >processing power vs minimizing electrical power. Xen3.4 now supports "sched_smt_power_savings" (both boot option and touchable by xenpm) to change power/performance preference. It's simple implementation to simply reverse the span order from existing package->core->thread to thread->core->package. More fine-grained flexibility could be given in future if hierarchical scheduling concept could be more clearly constructed like domain scheduler in Linux. Another possible 'fairness' point affected by power management could be to take freq scaling into consideration, since credit by far is simply calculated by elapsed time while elapsed time with different frequency actually indicates different consumed cycles. > >3. Target interface: > >The target interface will be similar to credit1: > >* The basic unit is the VM "weight". When competing for cpu >resources, VMs will get a share of the resources proportional to their >weight. (e.g., two cpu-hog workloads with weights of 256 and 512 will >get 33% and 67% of the cpu, respectively). imo, weight is not strictly translated into the care for latency. any elaboration on that? I remembered that previously Nishiguchi-san gave idea to boost credit, and Disheng proposed static priority. Maybe you can make a summary to help people how latency would be exactly ensured in your proposal > >* Additionally, we will be introducing a "reservation" or "floor". > (I'm open to name changes on this one.) This will be a minimum > amount of cpu time that a VM can get if it wants it. this is good idea. > >For example, one could give dom0 a "reservation" of 50%, but leave the >weight at 256. No matter how many other VMs run with a weight of 256, >dom0 will be guaranteed to get 50% of one cpu if it wants it. there should be some way to adjust or limit usage of 'reservation' when multiple vcpus both claim a desire which however sum up to some exceeding cpu's computing power or weaken your general 'weight-as-basic-unit' idea? > >* The "cap" functionality of credit1 will be retained. > >This is a maximum amount of cpu time that a VM can get: i.e., a VM >with a cap of 50% will only get half of one cpu, even if the rest of >the system is completely idle. > >* We will also have an interface to the cpu-vs-electrical power. > >This is yet to be defined. At the hypervisor level, it will probably >be a number representing the "badness" of powering up extra cpus / >cores. At the tools level, there will probably be the option of >either specifying the number, or of using one of 2/3 pre-defined >values {power, balance, green/battery}. Not sure how that number will be defined. Maybe we can follow current way to just add individual name-based options matching its purpose (such as migration_cost and sched_smt_power_savings...) Thanks, Kevin

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.