[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: NUMA and SMP


  • To: "tgh" <tianguanhua@xxxxxxxxxx>
  • From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
  • Date: Tue, 20 Mar 2007 16:50:25 +0100
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 20 Mar 2007 08:49:53 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: Acdq9qVjntBTp1YCQtmgjQk9usexpgAAB40g
  • Thread-topic: [Xen-devel] Re: NUMA and SMP

> -----Original Message-----
> From: tgh [mailto:tianguanhua@xxxxxxxxxx] 
> Sent: 20 March 2007 13:50
> To: Petersson, Mats
> Cc: Emmanuel Ackaouy; Anthony Liguori; xen-devel; David 
> Pilger; Ryan Harper
> Subject: Re: [Xen-devel] Re: NUMA and SMP
> 
> Thank you for your reply
> 
> I see
> and does xen support the numa-aware guestlinux now or in the future?

There is no support in current Xen for NUMA-awareness, and for the guest to 
understand NUMA-ness in the system, Xen must have sufficient understanding to 
forward the relevant information to the guest. 

> 
> another question maybe should be another topic
> what is the function of xc_map_foreign_range()in 
> /tools/libxc/xc_linux.c?
> does xc_map_foreign_range() mmap the shared memory with another domain
> ,or with domain0 ,orsomething?

It maps a shared memory region with the domain specified by "domid". 

--
Mats

> 
> could you help me
> Thanks in advance
> 
> Petersson, Mats 写道:
> >> -----Original Message-----
> >> From: tgh [mailto:tianguanhua@xxxxxxxxxx] 
> >> Sent: 20 March 2007 13:10
> >> To: Emmanuel Ackaouy
> >> Cc: Petersson, Mats; Anthony Liguori; xen-devel; David 
> >> Pilger; Ryan Harper
> >> Subject: Re: [Xen-devel] Re: NUMA and SMP
> >>
> >> I am puzzled ,what is the page migration?
> >> Thank you in advance
> >>     
> >
> > I'm not entirely sure it's the correct term, but I used to 
> indicate that if you allocate some memory local to processor 
> no X, and then later on, the page is used by processor Y, 
> then one could consider "moving" the page from the memory 
> region of X to the memory region of Y. So you "migrate" the 
> page from one processor to another. This is of course not a 
> "free" operation, and it's only really helpful if the memory 
> is accessed many times (and not cached each time it's accessed). 
> >
> > A case where this can be done "almost for free" is when a 
> page is swapped out, and on return, allocate the page from 
> the processor that made the access. But of course, if you're 
> looking for ultimate performance, swapping is a terrible idea 
> - so making small optimizations in memory management when 
> you're loosing tons of cycles by swapping is meaningless as a 
> overall performance gain. 
> >
> > --
> > Mats
> >   
> >> Emmanuel Ackaouy 写道:
> >>     
> >>> On the topic of NUMA:
> >>>
> >>> I'd like to dispute the assumption that a NUMA-aware OS 
> can actually
> >>> make good decisions about the initial placement of memory in a
> >>> reasonable hardware ccNUMA system.
> >>>
> >>> How does the OS know on which node a particular chunk of memory
> >>> will be most accessed? The truth is that unless the application or
> >>> person running the application is herself NUMA-aware and 
> can provide
> >>> placement hints or directives, the OS will seldom beat a 
> >>>       
> >> round-robin /
> >>     
> >>> interleave or random placement strategy.
> >>>
> >>> To illustrate, consider an app which lays out a bunch of 
> >>>       
> >> data in memory
> >>     
> >>> in a single thread and then spawns worker threads to process it.
> >>>
> >>> Is the OS to place memory close to the initial thread? How can it 
> >>> possibly
> >>> know how many threads will eventually process the data?
> >>>
> >>> Even if the OS knew how many threads will eventually crunch 
> >>>       
> >> the data,
> >>     
> >>> it cannot possibly know at placement time if each thread 
> >>>       
> >> will work on an
> >>     
> >>> assigned data subset (and if so, which one) or if it will 
> act as a 
> >>> pipeline
> >>> stage with all the data being passed from one thread to the next.
> >>>
> >>> If you go beyond initial memory placement or start 
> >>>       
> >> considering memory
> >>     
> >>> migration, then it's even harder to win because you have 
> to pay copy
> >>> and stall penalties during migrations. So you have to be 
> real smart
> >>> about predicting the future to do better than your ~10-40% memory
> >>> bandwidth and latency hit associated with doing simple memory
> >>> interleaving on a modern hardware-ccNUMA system.
> >>>
> >>> And it gets worse for you when your app is successfully 
> >>>       
> >> taking advantage
> >>     
> >>> of the memory cache hierarchy because its performance is 
> >>>       
> >> less impacted
> >>     
> >>> by raw memory latency and bandwidth.
> >>>
> >>> Things also get more difficult on a time-sharing host 
> with competing
> >>> apps.
> >>>
> >>> There is a strong argument for making hypervisors and OSes NUMA
> >>> aware in the sense that:
> >>> 1- They know about system topology
> >>> 2- They can export this information up the stack to 
> >>>       
> >> applications and 
> >>     
> >>> users
> >>> 3- They can take in directives from users and applications to 
> >>> partition the
> >>> host and place some threads and memory in specific partitions.
> >>> 4- They use an interleaved (or random) initial memory 
> >>>       
> >> placement strategy
> >>     
> >>> by default.
> >>>
> >>> The argument that the OS on its own -- without user or application
> >>> directives -- can make better placement decisions than 
> >>>       
> >> round-robin or
> >>     
> >>> random placement is -- in my opinion -- flawed.
> >>>
> >>> I also am skeptical that the complexity associated with 
> >>>       
> >> page migration
> >>     
> >>> strategies would be worthwhile: If you got it wrong the 
> >>>       
> >> first time, what
> >>     
> >>> makes you think you'll do better this time?
> >>>
> >>> Emmanuel.
> >>>
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-devel
> >>>
> >>>
> >>>       
> >>
> >>
> >>     
> >
> >
> >
> >
> >   
> 
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.