[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Cpu pools discussion


  • To: George Dunlap <dunlapg@xxxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Tue, 28 Jul 2009 07:40:54 +0200
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Delivery-date: Mon, 27 Jul 2009 22:41:23 -0700
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:X-Enigmail-Version:Content-Type: Content-Transfer-Encoding; b=hxGquCvNYn29v7KlRpMpPe3+copE/F/qifBubKttf+9UjrHGuIUMLjAy cENQgwvaNlie4VFqwU6NUIKZvc99WeKJuvHTnqpANuZ1F9km0DR6Wdzv2 kkdI2KX4Mv64ZrROh1lAU+1PZdFD8+DVJ9JCXOrumKFm1d0tfdVHaCf6N XJeTTUDdqC1D0da32hdpr59o3Qj2Vla6dqe+Pnhj3V2tiP7hKWZrpkLQt xguUf20mCK7A1Z9NaomzY/SJ8wAAZ;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

George Dunlap wrote:
> Keir (and community),
> 
> Any thoughts on Jeurgen Gross' patch on cpu pools?
> 
> As a reminder, the idea is to allow "pools" of cpus that would have
> separate schedulers.  Physical cpus and domains can be moved from one
> pool to another only by an explicit command.  The main purpose Fujitsu
> seems to have is to allow a simple machine "partitioning" that is more
> robust than using simple affinity masks.  Another potential advantage
> would be the ability to use different schedulers for different
> purposes.
> 
> For my part, it seems like they should be OK.  The main thing I don't
> like is the ugliness related to continue_hypercall_on_cpu(), described
> below.
> 
> Jeurgen, could you remind us what were the advantages of pools in the
> hypervisor, versus just having
> affinity masks (with maybe sugar in the toolstack)?

Sure.

Our main reason for introducing pools was the weakness of the current
scheduler(s) to schedule domains according to their weights while restricting
the domains to a subset of the physical processors using pinning.
I think it is virtually impossible to find a general solution for this
problem without some sort of pooling (if somebody proves me being wrong here,
I'm completely glad to take this "perfect" scheduler instead of pools :-) ).

So while the reason for the pools was a lack of functionality in the first
run, there are some more benefits:
+ possibility to use different schedulers for different domains on the same
  machine (do you remember the discussion with bcredit?). Zhigang has posted
  a request for this feature already.
+ less lock conflicts on huge machines with many processors
+ pools could be a good base for NUMA-aware scheduling policies

> 
> Re the ugly part of the patch, relating to continue_hypercall_on_cpu():
> 
> Domains are assigned to a pool, so
> if continue_hypercall_on_cpu() is called for a cpu not in the domain's
> pool, you can't just run it normally.  Jeurgen's solution (IIRC) was to
> pause all domains in the other pool, temporarily move the cpu in
> question to the calling domain's pool, finish the hypercall, then move
> the cpu in question back to the other pool.
> 
> Since there's a lot of antecedents in that, let's take an example:
> 
> Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3.
> 
> Domain 0 is running in pool A, domain 1 is running in pool B.
> 
> Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2.
> 
> Cpu 2 is in pool B, so Jeurgen's patch:
>  * Pauses domain 1
>  * Moves cpu 2 to pool A
>  * Finishes the hypercall
>  * Moves cpu 2 back to pool B
>  * Unpauses domain 1
> 
> That seemed a bit ugly to me, but I'm not familiar enough with the use
> cases or the code to know if there's a cleaner solution.

Some thoughts on this topic:

The continue_hypercall_on_cpu() function is needed on x86 for loading new
microcode into the processor. The source buffer of the new microcode is
located in dom0-memory so dom0 has to run on the physical processor the new
code is loaded into (otherwise it wouldn't be accessible).
We could avoid the complete continue_hypercall_on_cpu() stuff if the microcode
would be copied into a hypervisor buffer and use on_selected_cpus() instead.
Other users (cpu hotplug and acpi_enter_sleep) would have to switch to other
solutions as well.

BTW: continue_hypercall_on_cpu() exists on x86 only and it isn't really much
better than my usage of it:
- remember old pinning state of current vcpu
- pin it temporarily to the cpu it should continue on
- continue the hypercall
- remove temporary pinning
- re-establish old pinning (if any)
Pretty much the same as my solution above ;-)

So I would suggest to eliminate continue_hypercall_on_cpu() completely if you
are feeling uneasy with my solution.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@xxxxxxxxxxxxxx
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.