[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: NUMA and SMP
On Jan 16, 2007, at 15:19, Petersson, Mats wrote: There is a strong argument for making hypervisors and OSes NUMA aware in the sense that: 1- They know about system topology 2- They can export this information up the stack to applications and users 3- They can take in directives from users and applications to partition the host and place some threads and memory in specific partitions. 4- They use an interleaved (or random) initial memory placement strategy by default. The argument that the OS on its own -- without user or application directives -- can make better placement decisions than round-robin or random placement is -- in my opinion -- flawed.Debatable - it depends a lot on WHAT applications you expect to run, andhow they behave. If you consider an application that frequently allocates and de-allocates memory dynamically in a single threaded process (say compiler), then allocating memory in the local node should be the "first choice".Multithreaded apps can use a similar approach, if a thread is allocatingmemory, it's often a good chance that the memory is being used by that thread too [although this doesn't work for message passing between threads, obviously, this is again a case where "knowledge from the app" will be the only better solution than "random"]. This approach is by far not perfect, but if you consider thatapplications often do short term allocations, it makes sense to allocateon the local node if possible. I do not agree. Just because a thread happens to run on processor X when it first faults in a page off the process' heap doesn't give you a good indication that the memory will be used mostly by this thread or that the thread will continue running on the same processor. There are at least as many cases when this assumption is invalid than when it is valid. Without any solid indication that something else will work better, round robin allocation has to be the default strategy. Also, if you allow one process to consume a large percentage of one node's memory, you are indirectly hurting all competing multi-threaded apps which benefit from higher total memory bandwidth when they spread their data across nodes. I understand your point that if a single threaded process quickly shrinks its heap after growing it, it makes it less likely that it will migrate to a different processor while it is using this memory. I'm not sure how you predict that memory will be quickly released at allocation time though. Even if you could, I maintain you would still need safeguards in place to balance that process' needs with that of competing multi-threaded apps benefiting from the memory bandwidth scaling with number of hosting nodes. You could try and compromise and allocate round robin starting locally and perhaps with diminishing strides as the total allocation grows (ie allocate local and progressively move towards a page round robin scheme as more memory is requested). I'm not sure this would do any better than plain old dumb round robin in the average case but it's worth a thought. However, supporting NUMA in the Hypervisor and forwarding arch-info tothe guest would make sense. At the least the very basic principle of: Ifthe guest is to run on a limited set of processors (nodes), allocate memory from that (those) node(s) for the guest would make a lot of sense. I suspect there is widespread agreement on this point. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |