Xen project Mailing List

Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.

To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Wed, 15 Apr 2009 16:07:00 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 15 Apr 2009 08:07:33 -0700

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=JyojyH46YotoVC8FYdry2ZIef8X3lDLMair4+pUYaiIWr6KoGVGtFY80ckHD66tvri lYV/hEHZgEuAWnC7VSU1taKMZn/UlKfpRAbV3fjxblJObRvfM9X01hZ9sMJlB4nM9H5H QkrYIh8xgpNi8FNjxs4ITEOmShipUMc5YUbD8=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

2009/4/10 Tian, Kevin <kevin.tian@xxxxxxxxx>: > How about VM number in total you'd like to support? A rule-of-thumb number would be that we want to perform well at 4 VMs per core, and wouldn't mind having a performance "cliff" past 8 per core (not thread). So for a 16-core system, that would be "good" for 64 VMs and "acceptable" up to 128 VMs. > Do you mean that same elapsed time in above two scenarios will be > translated into different credits? Yes. Ideally, we want to give "processing power" based on weight. But the "processing power" of a thread whose sibling is idle is significantly more than the "processing power" of a thread whose sibling is running. (Same thing possibly for cpu frequency scaling.) So we'd want to arrange the credits such that VMs with equal weight equal "processing power", not just equal "time on a logical cpu". > Xen3.4 now supports "sched_smt_power_savings" (both boot option > and touchable by xenpm) to change power/performance preference. > It's simple implementation to simply reverse the span order from > existing package->core->thread to thread->core->package. More > fine-grained flexibility could be given in future if hierarchical scheduling > concept could be more clearly constructed like domain scheduler > in Linux. I haven't looked at this code. From your description here it sounds like a sort of a simple hack to get the effect we want (either spreading things out or pushing them together) -- is that correct? My general feeling is that hacks are good short-term solutions, but not long-term. Things always get more complicated, and often have unexpected side-effects. I think since we're doing scheduler work, it's worth it to try to see if we can actually solve the power/performance problem. > imo, weight is not strictly translated into the care for latency. any > elaboration on that? I remembered that previously Nishiguchi-san > gave idea to boost credit, and Disheng proposed static priority. > Maybe you can make a summary to help people how latency would > be exactly ensured in your proposal All of this needs to be run through experiments. So far, I've had really good success with putting waking VMs in "boost" priority for 1ms if they still have credits. (And unlike the credit scheduler, I try to make sure that a VM rarely runs out of credits.) > there should be some way to adjust or limit usage of 'reservation' when > multiple vcpus both claim a desire which however sum up to some > exceeding cpu's computing power or weaken your general > 'weight-as-basic-unit' idea? All "reservations" on the system must add up to less than the total processing power of the system. So a system with 2 cores can't have a sum of reservations more than 200%. Xen will check this when setting the reservation and return an appropriate error message if necessary. >>* We will also have an interface to the cpu-vs-electrical power. >> >>This is yet to be defined. At the hypervisor level, it will probably >>be a number representing the "badness" of powering up extra cpus / >>cores. At the tools level, there will probably be the option of >>either specifying the number, or of using one of 2/3 pre-defined >>values {power, balance, green/battery}. > > Not sure how that number will be defined. Maybe we can follow > current way to just add individual name-based options matching > its purpose (such as migration_cost and sched_smt_power_savings...) At the scheduler level, I was thinking along the lines of "core_power_up_cost". This would be comparible to the cost of having things waiting on the runqueue. So (for example) if the cost was 0.1, then when the load on the current processors reached 1.1, then it would power up another core. You could set it to 0.5 or 1.0 to save more power (at the cost of some performance). I think defining it that way is the closest to what you really want: a way to define the performance impact vs power consumption. Obviously at the user interface level, we might have something more manageable: e.g., {power, balance, green} => {0, 0.2, 0.8} or something like that. But as I said, the *goal* is to have a useful configurable interface; the implementation will depend on what actually can be made to work in practice. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.