Xen project Mailing List

RE: [Xen-devel] NUMA guest config options (was: Re: [PATCH 00/11] PV NUMA Guests)

To: Andre Przywara <andre.przywara@xxxxxxx>, "Cui, Dexuan" <dexuan.cui@xxxxxxxxx>, Dulloor <dulloor@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Fri, 23 Apr 2010 07:09:57 -0700 (PDT)

Cc: "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>

Delivery-date: Fri, 23 Apr 2010 07:12:04 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

While I like the direction this is going, please try to extend your model to cover the cases of ballooning and live-migration. For example, for "CONFINE", ballooning should probably be disallowed as pages surrendered on "this node" via ballooning may be only recoverable later on a different node. Similarly, creating a CONFINE guest is defined to fail if there is insufficient memory on any node... will live migration to a different physical machine similarly fail, even if an administrator explicitly requests it? In general, communicating NUMA topology to a guest is a "performance thing" and ballooning and live-migration are "flexibility things"; and performance and flexibility mix like oil and water. > -----Original Message----- > From: Andre Przywara [mailto:andre.przywara@xxxxxxx] > Sent: Friday, April 23, 2010 6:46 AM > To: Cui, Dexuan; Dulloor; xen-devel > Cc: Nakajima, Jun > Subject: [Xen-devel] NUMA guest config options (was: Re: [PATCH 00/11] > PV NUMA Guests) > > Hi, > > yesterday Dulloor, Jun and I had a discussion about the NUMA guest > configuration scheme, we came to the following conclusions: > 1. The configuration would be the same for HVM and PV guests, only the > internal method of propagation would differ. > 2. We want to make it as easy as possible, with best performance out of > the box as the design goal. Another goal is predictable performance. > 3. We (at least for now) omit more sophisticated tuning options (exact > user-driven description of the guest's topology), so the guest's > resources are split equally across the guest nodes. > 4. We have three basic strategies: > - CONFINE: let the guest use only one node. If that does not work, > fail. > - SPLIT: allocate resources from multiple nodes, inject a NUMA > topology into the guest (includes PV querying via hypercall). If the > guest is paravirtualized and does not know about NUMA (missing ELF > hint): fail. > - STRIPE: allocate the memory in an interleaved way from multiple > nodes, don't tell the guest about NUMA at all. > > If any one the above strategies is explicitly specified in the config > file and it cannot be met, then the guest creation will fail. > A fourth option would be the default: AUTOMATIC. This will try the > three > strategies after each other (order: CONFINE, SPLIT, STRIP). If one > fails, the next will be tried (this will never use striping for HVM > guests). > > 5. The number of guest nodes is internally specified via a min/max > pair. > By default min is 1, max is the number of system nodes. The algorithm > will try to use the smallest possible number of nodes. > > The question remaining is whether we want to expose this pair to the > user: > - For predictable performance we want to specify an exact number of > guest nodes, so set min=max=<number of nodes> > - For best performance, the number of nodes should be at small as > possible, so min is always 1. For the explicit CONFINE strategy, max > would also be one, for AUTOMATIC it should be as few as possible, which > is already built in the algorithm. > So it is not clear if "max nodes" is a useful option. If it would serve > as an upper boundary, then it is questionable whether > "failing-if-not-possible" is a useful result. > > So maybe we get along with just one (optional) value: guestnodes. > This will be useful in the SPLIT case, where it specifies the number of > nodes the guest sees (for predictable performance). CONFINE internally > overrides this value with "1". If one would impose a limit on the > number > of nodes, one would choose "AUTOMATIC" and set guestnodes to this > number. If single-node allocations fail, it will use as few nodes as > possible, not exceeding the specified number. > > Please comment on this. > > Thanks and regards, > Andre. > > -- > Andre Przywara > AMD-Operating System Research Center (OSRC), Dresden, Germany > Tel: +49 351 448-3567-12 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.