[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [xen-devel][vNUMA v2][PATCH 2/8] public interface



I have to be brief right now, but since I was the one that
suggested that Dulloor turn on "no_migrate" when the guest
is using NUMA optimizations, I should reply.

Andre, you are assuming a very static world, e.g.:

> I think we should inform the user
> about this and if she persists in the migration...

and:

> but maybe we can make an exception for Dom0 only, because this
> is the most prominent and frequent user of ballooning.

One of the reasons virtualization is increasingly successful is
highly dynamic resource management.  Please don't assume you
can "inform the user" that a migration might result in greatly
decreased performance... there may be hundreds or thousands
of migrations per minute in the data center.  And please don't
assume that today's horribly static "virtual physical memory"
model will be the same even a year or two from now.  If
virtualization is to survive the looming "memory wall",
much more creative mechanisms must be in place to better
optimize memory utilization and these will likely result in
thousands of "page-ownership" changes per guest per second.

> Most people deal with NUMA because they want to cure performance
> _drops_ caused by bad allocation policies. After all NUMA awareness
> is a performance optimization. If the user asks to migrate to
> another host, then we shouldn't come with fussy argument like
> NUMA. In my eyes it is a question of priorities, I don't want
> to deny migration because of this.

Actually, most people deal with NUMA because they are publishing
benchmark numbers ;-)

I firmly believe that virtualization exposes many tradeoffs
between performance and flexibility and these choices should
be presented to the "virtualization user" as simply as possible,
e.g. there should be a very high level choice between
performance and flexibility.  If a user is attempting to
improve performance by exposing the underlying NUMA-ness
of a physical machine to the guests, they are implicitly making
a policy decision choosing performance over flexibility.
If we then turn around and migrate the guest in such a
way that performance plummets, we've defeated the intent of
the user's implicitly stated policy.

Dan

> From: Andre Przywara [mailto:andre.przywara@xxxxxxx]
> 
> Dulloor wrote:
> > On Tue, Aug 3, 2010 at 8:52 AM, Keir Fraser
> <keir.fraser@xxxxxxxxxxxxx> wrote:
> >> On 03/08/2010 16:43, "Dulloor" <dulloor@xxxxxxxxx> wrote:
> >>
> >>>> I would expect guest would see nodes 0 to nr_vnodes-1, and the
> mnode_id
> >>>> could go away.
> >>> mnode_id maps the vnode to a particular physical node. This will be
> >>> used by balloon driver in
> >>> the VMs when the structure is passed as NUMA enlightenment to PVs
> and
> >>> PV on HVMs.
> >>> I have a patch ready for that (once we are done with this series).
> >> So what happens when the guest is migrated to another system with
> different
> >> physical node ids? Is that never to be supported? I'm not sure why
> you
> >> wouldn't hide the vnode-to-mnode translation in the hypervisor.
> >
> > Right now, migration is not supported when NUMA strategy is set.
> > This is in my TODO list (along with PoD support).
> >
> > There are a few open questions wrt migration :
> > - What if the destination host is not NUMA, but the guest is NUMA. Do
> we fake
> > those nodes ? Or, should we not select such a destination host to
> begin with.
> I don't see a problem with this situation. The guest has virtual nodes,
> these can be mapped in any way to actual physical nodes (but only by
> the
> hypervisor/Dom0, not by the guest itself).
> A corner case could be clearly to map all guest nodes to one single
> host
> node. In terms of performance this should be even better, if the new
> host can satisfy the requirement from one node, because there will be
> no
> remote accesses at all.
> > - What if the destination host is not NUMA, but guest has asked to be
> > striped across
> > a specific number of nodes (possibly for higher aggregate memory
> bandwidth) ?
> Most people deal with NUMA because they want to cure performance
> _drops_
> caused by bad allocation policies. After all NUMA awareness is a
> performance optimization. If the user asks to migrate to another host,
> then we shouldn't come with fussy argument like NUMA. In my eyes it is
> a
> question of priorities, I don't want to deny migration because of this.
> > - What if the guest has asked for a particular memory strategy
> > (split/confined/striped),
> > but the destination host can't guarantee that (because of the
> > distribution of free memory
> > across the nodes) ?
> I see, there is one case where the new host has more nodes than the old
> one, but the memory on each node is not sufficient (like migrating from
> a 2*8GB machine to an 8*4GB one). I think we should inform the user
> about this and if she persists in the migration, use some kind of
> interleaving to join two (or more) nodes together. Looks like future
> work, though.
> > Once we answer these questions, we will know whether vnode-to-mnode
> > translation is better
> > exposed or not. And, if exposed, could we just renegotiate the
> > vnode-to-mnode translation at the
> > destination host. I have started working on this. But, I have some
> > other patches ready to go
> > which we might want to check-in first - PV/Dom0 NUMA patches,
> > Ballooning support (see below).
> >
> > As such, the purpose of vnode-to-mnode translation is for the
> enlightened
> > guests to know where their underlying memory comes from, so that
> > over-provisioning features
> > like ballooning are given a chance to maintain this distribution.
> I was afraid you were saying that ;-) I haven't thought about this in
> detail, but maybe we can make an exception for Dom0 only, because this
> is the most prominent and frequent user of ballooning. But I really
> think that DomUs should not know about or deal with host NUMA nodes.
> 
> Regards,
> Andre.
> 
> --
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> Tel: +49 351 448-3567-12
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.