[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 4/7] libxl/vnuma: vnuma domain config



On Tue, 2013-08-27 at 03:54 -0400, Elena Ufimtseva wrote:
> Defines VM config options for vNUMA PV domain creation as follows:
> vnodes - number of nodes and enables vnuma
> vnumamem - vnuma nodes memory sizes
> vnuma_distance - vnuma distance table (may be omitted)
> vcpu_to_vnode - vcpu to vnode mask (may be omitted)
> 
> sum of all numamem should be equal to memory option.
> Number of vcpus should not be less that number of vnodes.
> 
> VM config Examples:

Please patch docs/ as necessary (e.g. the manpages) at the same time.

> 
> memory = 16384
> vcpus = 8
> name = "rc"
> vnodes = 8
> vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g"
> vcpu_to_vnode ="5 6 7 4 3 2 1 0"

xl cfg supports arrays, is there any reason not to use them?

Hopefully (lib)xl will also implement some sort of sane default in the
case where people don't want to spell all this out?

Is it actually useful to be able to arbitrarily map vcpus to nodes? I'd
have thought dividing the vcpus among the nodes evenly would be
sufficient for almost everyone.

What happens if the total of vnumamem does not == memory? Would it be
useful to be able to specify this as ratios? e.g. "1:1:1:1" etc? Or
maybe we should simply extend the memory syntax to take a list and
memory becomes the total?

What happens if length(vnumamem) != vnodes? Likewise vcpu_to_vnode vs
vcspus.

How is maxmem handled/reconciled? Is there a vnumamaxmem? Likewise
maxvcpus.

> memory = 2048
> vcpus = 4
> name = "rc9"
> vnodes = 2
> vnumamem = "1g, 1g"
> vnuma_distance = "10 20, 10 20"
> vcpu_to_vnode ="1, 3, 2, 0"
> 
> Signed-off-by: Elena Ufimtseva <ufimtseva@xxxxxxxxx>
> ---
>  tools/libxl/libxl.c          |   28 ++++++
>  tools/libxl/libxl.h          |   15 ++++
>  tools/libxl/libxl_arch.h     |    6 ++
>  tools/libxl/libxl_dom.c      |  115 ++++++++++++++++++++++--
>  tools/libxl/libxl_internal.h |    3 +
>  tools/libxl/libxl_types.idl  |    6 +-
>  tools/libxl/libxl_x86.c      |   91 +++++++++++++++++++
>  tools/libxl/xl_cmdimpl.c     |  197 
> +++++++++++++++++++++++++++++++++++++++++-
>  8 files changed, 454 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 81785df..cd25474 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -4293,6 +4293,34 @@ static int libxl__set_vcpuonline_qmp(libxl__gc *gc, 
> uint32_t domid,
>      }
>      return 0;
>  }
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA

libxl itself doesn't need to use the ifdef, just provide it for external
callers.

> +int libxl_domain_setvnodes(libxl_ctx *ctx,
> +                            uint32_t domid,
> +                            uint16_t nr_vnodes,
> +                            uint16_t nr_vcpus,
> +                            vnuma_memblk_t *vnuma_memblks,
> +                            int *vdistance,
> +                            int *vcpu_to_vnode,
> +                            int *vnode_to_pnode)
> +{
> +    GC_INIT(ctx);
> +    int ret;
> +    ret = xc_domain_setvnodes(ctx->xch, domid, nr_vnodes,
> +                                nr_vcpus, vnuma_memblks,
> +                                vdistance, vcpu_to_vnode,
> +                                vnode_to_pnode);
> +    GC_FREE;
> +    return ret;
> +}
> +
> +int libxl_default_vcpu_to_vnuma(libxl_domain_build_info *info)
> +{
> +    int i;
> +    for(i = 0; i < info->max_vcpus; i++)
> +        info->vcpu_to_vnode[i] = i % info->nr_vnodes;
> +    return 0;
> +}
> +#endif
>  
>  int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap 
> *cpumap)
>  {
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index be19bf5..a1a5e33 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -706,6 +706,21 @@ void libxl_vcpuinfo_list_free(libxl_vcpuinfo *, int 
> nr_vcpus);
>  void libxl_device_vtpm_list_free(libxl_device_vtpm*, int nr_vtpms);
>  void libxl_vtpminfo_list_free(libxl_vtpminfo *, int nr_vtpms);
>  
> +/* vNUMA topology */
> +
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA 

Unneeded, but you do need to add the #define (which seems missing, how
does this stuff get built?)

> +#include <xen/vnuma.h>

Includes should go at the top unless there is a good reason otherwise.

However we try and avoid exposing Xen interfaces in the libxl interface.
This means you need to define a libxl equivalent, which should be done
via the libxl IDL.

> +int libxl_domain_setvnodes(libxl_ctx *ctx,
> +                            uint32_t domid,
> +                            uint16_t nr_vnodes,
> +                            uint16_t nr_vcpus,
> +                            vnuma_memblk_t *vnuma_memblks,
> +                            int *vdistance,
> +                            int *vcpu_to_vnode,
> +                            int *vnode_to_pnode);
> +
> +int libxl_default_vcpu_to_vnuma(libxl_domain_build_info *info);
> +#endif
>  /*
>   * Devices
>   * =======
> diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
> index abe6685..76c1975 100644
> --- a/tools/libxl/libxl_arch.h
> +++ b/tools/libxl/libxl_arch.h
> @@ -18,5 +18,11 @@
>  /* arch specific internal domain creation function */
>  int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
>                 uint32_t domid);
> +int libxl_vnuma_align_mem(libxl__gc *gc,

libxl__foo (double underscores) for internal function please.

> +                            uint32_t domid,
> +                            struct libxl_domain_build_info *b_info,
> +                            vnuma_memblk_t *memblks); /* linux specific 
> memory blocks: out */

Why/how is this Linux specific? This is a hypercall parameter, isn't it?

>  
> +
> +unsigned long e820_memory_hole_size(unsigned long start, unsigned long end, 
> struct e820entry e820[], int nr);
>  #endif
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 6e2252a..8bbbd18 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -200,6 +200,63 @@ static int numa_place_domain(libxl__gc *gc, uint32_t 
> domid,
>      libxl_cpupoolinfo_dispose(&cpupool_info);
>      return rc;
>  }
> +#define set_all_vnodes(n)    for(i=0; i< info->nr_vnodes; i++) \
> +                                info->vnode_to_pnode[i] = n
> +
> +int libxl_init_vnodemap(libxl__gc *gc, uint32_t domid,

Double underscore please.

> +                        libxl_domain_build_info *info)
> +{
> +    int i, n, start, nr_nodes;
> +    uint64_t *mems;
> +    unsigned long long claim[16];

Where does 16 come from?

> +    libxl_numainfo *ninfo = NULL;
> +
> +    if (info->vnode_to_pnode == NULL)
> +        info->vnode_to_pnode = calloc(info->nr_vnodes, 
> sizeof(*info->vnode_to_pnode));
> +
> +    set_all_vnodes(NUMA_NO_NODE);
> +    mems = info->vnuma_memszs;
> +    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
> +    if (ninfo == NULL) {
> +        LOG(INFO, "No HW NUMA found\n");
> +        return -EINVAL;
> +    }
> +    /* lets check if all vnodes will fit in one node */
> +    for(n = 0; n < nr_nodes;  n++) {
> +        if(ninfo[n].free/1024 >= info->max_memkb) {
> +            /* all fit on one node, fill the mask */
> +            set_all_vnodes(n);
> +            LOG(INFO, "Setting all vnodes to node %d, free = %lu, need =%lu 
> Kb\n", n, ninfo[n].free/1024, info->max_memkb);
> +            return 0;
> +            }
> +    }
> +    /* TODO: change algorithm. The current just fits the nodes
> +     * Will be nice to have them also sorted by size  */
> +    /* If no p-node found, will be set to NUMA_NO_NODE and allocation will 
> fail */
> +    LOG(INFO, "Found %d physical NUMA nodes\n", nr_nodes);
> +    memset(claim, 0, sizeof(*claim) * 16);
> +    start =  0;
> +    for ( n = 0; n < nr_nodes;  n++ )
> +    {

If nr_nodes > 16 this will overflow claim[n].

> +        for ( i = start; i < info->nr_vnodes; i++ )
> +        {
> +            LOG(INFO, "Compare %Lx for vnode[%d] size %lx with free space on 
> pnode[%d], free %lx\n",
> +                  claim[n] + mems[i], i, mems[i], n, ninfo[n].free);

These should be at best LOG(DEBUG, ...). Perhaps a LOG_(INFO, ...)
summary at the end would be suitable?

> +            if ( ((claim[n] + mems[i]) <= ninfo[n].free) && 
> (info->vnode_to_pnode[i] == NUMA_NO_NODE) )
> +            {
> +                info->vnode_to_pnode[i] = n;
> +                LOG(INFO, "Set vnode[%d] to pnode [%d]\n", i, n);
> +                claim[n] += mems[i];
> +            }
> +            else {
> +                /* Will have another chance at other pnode */
> +                start = i;
> +                continue;
> +            }
> +        }
> +    }
> +    return 0;
> +}
>  
>  int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>                libxl_domain_config *d_config, libxl__domain_build_state 
> *state)
> @@ -232,9 +289,36 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>          if (rc)
>              return rc;
>      }
> +#ifdef  LIBXL_HAVE_BUILDINFO_VNUMA

Not needed.

> +    if (info->nr_vnodes <= info->max_vcpus && info->nr_vnodes != 0) {
> +        vnuma_memblk_t *memblks = libxl__calloc(gc, info->nr_vnodes, 
> sizeof(*memblks));
> +        libxl_vnuma_align_mem(gc, domid, info, memblks);
> +        if (libxl_init_vnodemap(gc, domid, info) != 0) {
> +            LOG(INFO, "Failed to call init_vnodemap\n");
> +            rc = libxl_domain_setvnodes(ctx, domid, info->nr_vnodes,
> +                                    info->max_vcpus, memblks,
> +                                    info->vdistance, info->vcpu_to_vnode,
> +                                    NULL);
> +        }
> +        else
> +            rc = libxl_domain_setvnodes(ctx, domid, info->nr_vnodes,
> +                                    info->max_vcpus, memblks,
> +                                    info->vdistance, info->vcpu_to_vnode,
> +                                    info->vnode_to_pnode);
> +        if (rc < 0 ) LOG(INFO, "Failed to call xc_domain_setvnodes\n");
> +        for(int i=0; i<info->nr_vnodes; i++)
> +            LOG(INFO, "Mapping vnode %d to pnode %d\n", i, 
> info->vnode_to_pnode[i]);
> +        libxl_bitmap_set_none(&info->nodemap);
> +        libxl_bitmap_set(&info->nodemap, 0);
> +    }
> +    else {
> +        LOG(INFO, "NOT Calling vNUMA construct with nr_nodes = %d\n", 
> info->nr_vnodes);
> +        info->nr_vnodes = 0;
> +    }
> +#endif
>      libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
>      libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap);
> -
> +        
>      xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + 
> LIBXL_MAXMEM_CONSTANT);
>      xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
>      state->store_domid = xs_domid ? atoi(xs_domid) : 0;
> @@ -368,7 +452,20 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>              }
>          }
>      }
> -
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA

and again.

> +    if (info->nr_vnodes != 0 && info->vnuma_memszs != NULL && 
> info->vnode_to_pnode != NULL) {
> +        dom->nr_vnodes = info->nr_vnodes;
> +        dom->vnumablocks = malloc(info->nr_vnodes * 
> sizeof(*dom->vnumablocks));
> +        dom->vnode_to_pnode = (int *)malloc(info->nr_vnodes * 
> sizeof(*info->vnode_to_pnode));
> +        dom->vmemsizes = malloc(info->nr_vnodes * 
> sizeof(*info->vnuma_memszs));
> +        if (dom->vmemsizes == NULL || dom->vnode_to_pnode == NULL) {
> +            LOGE(ERROR, "%s:Failed to allocate memory for memory 
> sizes.\n",__FUNCTION__);

I thought LOG* already included file/function stuff.

> +            goto out;
> +        }
> +        memcpy(dom->vmemsizes, info->vnuma_memszs, 
> sizeof(*info->vnuma_memszs) * info->nr_vnodes);
> +        memcpy(dom->vnode_to_pnode, info->vnode_to_pnode, 
> sizeof(*info->vnode_to_pnode) * info->nr_vnodes);
> +    }
> +#endif
>      dom->flags = flags;
>      dom->console_evtchn = state->console_port;
>      dom->console_domid = state->console_domid;
> @@ -388,9 +485,17 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>          LOGE(ERROR, "xc_dom_mem_init failed");
>          goto out;
>      }
> -    if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
> -        LOGE(ERROR, "xc_dom_boot_mem_init failed");
> -        goto out;
> +    if (info->nr_vnodes != 0 && info->vnuma_memszs != NULL) {
> +        if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
> +            LOGE(ERROR, "xc_dom_boot_mem_init_node  failed");

No _node on the actual call here, I can't see how it differes from the
following call in fact.

> +            goto out;
> +        }
> +    }
> +    else {
> +        if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
> +            LOGE(ERROR, "xc_dom_boot_mem_init failed");
> +            goto out;
> +        }
>      }
>      if ( (ret = xc_dom_build_image(dom)) != 0 ) {
>          LOGE(ERROR, "xc_dom_build_image failed");
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f051d91..4a501c4 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2709,6 +2709,7 @@ static inline void libxl__ctx_unlock(libxl_ctx *ctx) {
>  #define CTX_LOCK (libxl__ctx_lock(CTX))
>  #define CTX_UNLOCK (libxl__ctx_unlock(CTX))
>  
> +#define NUMA_NO_NODE 0xFF

256 nodes isn't completely implausible. Looks like nr_vnodes is a
uint16_t so 0xffff or ~((uint16_t)0) would be better I think.


>  /*
>   * Automatic NUMA placement
>   *
> @@ -2832,6 +2833,8 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
>      libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
>  }
>  
> +int libxl_init_vnodemap(libxl__gc *gc, uint32_t domid,
> +                                libxl_domain_build_info *info);
>  /*
>   * Inserts "elm_new" into the sorted list "head".
>   *
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 85341a0..c3a4d95 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -208,6 +208,7 @@ libxl_dominfo = Struct("dominfo",[
>      ("vcpu_max_id", uint32),
>      ("vcpu_online", uint32),
>      ("cpupool",     uint32),
> +    ("nr_vnodes",   uint16),
>      ], dir=DIR_OUT)
>  
>  libxl_cpupoolinfo = Struct("cpupoolinfo", [
> @@ -279,7 +280,10 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("disable_migrate", libxl_defbool),
>      ("cpuid",           libxl_cpuid_policy_list),
>      ("blkdev_start",    string),
> -    
> +    ("vnuma_memszs",    Array(uint64, "nr_vnodes")),
> +    ("vcpu_to_vnode",   Array(integer, "nr_vnodemap")),
> +    ("vdistance",       Array(integer, "nr_vdist")),
> +    ("vnode_to_pnode",  Array(integer, "nr_vnode_to_pnode")),
>      ("device_model_version", libxl_device_model_version),
>      ("device_model_stubdomain", libxl_defbool),
>      # if you set device_model you must set device_model_version too
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index a78c91d..35da3a8 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -308,3 +308,94 @@ int libxl__arch_domain_create(libxl__gc *gc, 
> libxl_domain_config *d_config,
>  
>      return ret;
>  }
> +
> +unsigned long e820_memory_hole_size(unsigned long start, unsigned long end, 
> struct e820entry e820[], int nr)
> +{
> +#define clamp(val, min, max) ({             \
> +    typeof(val) __val = (val);              \
> +    typeof(min) __min = (min);              \
> +    typeof(max) __max = (max);              \
> +    (void) (&__val == &__min);              \
> +    (void) (&__val == &__max);              \
> +    __val = __val < __min ? __min: __val;   \
> +    __val > __max ? __max: __val; })
> +    int i;
> +    unsigned long absent, start_pfn, end_pfn;
> +    absent = start - end;
> +    for(i = 0; i < nr; i++) {
> +        if(e820[i].type == E820_RAM) {
> +            start_pfn = clamp(e820[i].addr, start, end);
> +            end_pfn =   clamp(e820[i].addr + e820[i].size, start, end);
> +            absent -= end_pfn - start_pfn;
> +        }
> +    }
> +    return absent;
> +}
> +
> +/* Align memory blocks for linux NUMA build image */
> +int libxl_vnuma_align_mem(libxl__gc *gc,
> +                            uint32_t domid,
> +                            libxl_domain_build_info *b_info,
> +                            vnuma_memblk_t *memblks) /* linux specific 
> memory blocks: out */ 
> +{
> +#ifndef roundup
> +#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#endif 
> +    /* 
> +      This function transforms mem block sizes in bytes 
> +      into aligned PV Linux guest NUMA nodes. 
> +      XEN will provide this memory layout to PV Linux guest upon boot for
> +      PV Linux guests.

You say PV Linux guest three times here but I don't think any of this is
specific to PV Linux as opposed to PV guests generally (whether or not
Linux is the only current implementation of this interface doesn't
really matter)

> +    */
> +    int i, rc;
> +    unsigned long shift = 0, size, node_min_size = 1, limit;
> +    unsigned long end_max;
> +    uint32_t nr;
> +    struct e820entry map[E820MAX];
> +    
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> +    if (rc < 0) {
> +        errno = rc;
> +        return -EINVAL;
> +    }
> +    nr = rc;
> +    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> +                       (b_info->max_memkb - b_info->target_memkb) +
> +                       b_info->u.pv.slack_memkb);
> +    if (rc)
> +        return ERROR_FAIL;
> +    
> +    end_max = map[nr-1].addr + map[nr-1].size;
> +    
> +    shift = 0;
> +    for(i = 0; i < b_info->nr_vnodes; i++) {
> +        printf("block [%d] start inside align = %#lx\n", i, 
> b_info->vnuma_memszs[i]);

No printf in libxl please.

> +    }
> +    memset(memblks, 0, sizeof(*memblks)*b_info->nr_vnodes);
> +    memblks[0].start = 0;
> +    for(i = 0; i < b_info->nr_vnodes; i++) {
> +        memblks[i].start += shift;
> +        memblks[i].end += shift + b_info->vnuma_memszs[i];
> +        limit = size = memblks[i].end - memblks[i].start;
> +        while (memblks[i].end - memblks[i].start - 
> e820_memory_hole_size(memblks[i].start, memblks[i].end, map, nr) < size) {

Please see if you can shorten this line.

> +            memblks[i].end += node_min_size;
> +            shift += node_min_size;
> +            if (memblks[i].end - memblks[i].start >= limit) {
> +                memblks[i].end = memblks[i].start + limit;
> +                break;
> +            }
> +            if (memblks[i].end == end_max) {
> +                memblks[i].end = end_max;
> +                break;
> +            }
> +        }
> +        shift = memblks[i].end;
> +        memblks[i].start = roundup(memblks[i].start, 4*1024);
> +
> +        printf("start = %#010lx, end = %#010lx\n", memblks[i].start, 
> memblks[i].end);
> +    }
> +    if(memblks[i-1].end > end_max)
> +        memblks[i-1].end = end_max;
> +    return 0;
> +}
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 884f050..36a8275 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -539,7 +539,121 @@ vcpp_out:
>  
>      return rc;
>  }
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA

This isn't strictly needed in xl either. Although some people are keen
to have xl build against newer and older libxl in order to test the
compatibility guarentees made by the library.

> +static int vdistance_parse(char *vdistcfg, int *vdistance, int nr_vnodes)
> +{

Please can you use some line breaks to separate logical paragraphs and
make things more readable. e.g. after the local variable declaration and
between related blocks of code.

> +    char *endptr, *toka, *tokb, *saveptra = NULL, *saveptrb = NULL;
> +    int *vdist_tmp = NULL;
> +    int rc = 0;
> +    int i, j, dist, parsed = 0;      
> +    rc = -EINVAL;

Here you have:

int rc = 0;
rc = -EINVAL

One of them is redundant.

> +    if(vdistance == NULL) {
> +        return rc;
> +    }
> +    vdist_tmp = (int *)malloc(nr_vnodes * nr_vnodes * sizeof(*vdistance));
> +    if (vdist_tmp == NULL)
> +        return rc;
> +    i =0; j = 0;
> +    for (toka = strtok_r(vdistcfg, ",", &saveptra); toka;
> +        toka = strtok_r(NULL, ",", &saveptra)) {
> +        if ( i >= nr_vnodes ) 
> +            goto vdist_parse_err;
> +        for (tokb = strtok_r(toka, " ", &saveptrb); tokb;
> +            tokb = strtok_r(NULL, " ", &saveptrb)) {
> +            if (j >= nr_vnodes) 
> +                goto vdist_parse_err;
> +            dist = strtol(tokb, &endptr, 10);
> +            if (tokb == endptr)
> +                goto vdist_parse_err;
> +            *(vdist_tmp + j*nr_vnodes + i) = dist;
> +            parsed++;
> +            j++;
> +        }
> +        i++;
> +        j = 0;

This would all be easier if it was an xlcfg list.

> +    }
> +    rc = parsed;
> +    memcpy(vdistance, vdist_tmp, nr_vnodes * nr_vnodes * sizeof(*vdistance));
> +vdist_parse_err:
> +    if (vdist_tmp !=NULL ) free(vdist_tmp);
> +    return rc;
> +}
>  
> +static int vcputovnode_parse(char *cfg, int *vmap, int nr_vnodes, int 
> nr_vcpus)
> +{
> +    char *toka, *endptr, *saveptra = NULL;
> +    int *vmap_tmp = NULL;
> +    int rc = 0;
> +    int i;
> +    rc = -EINVAL;
> +    i = 0;
> +    if(vmap == NULL) {
> +        return rc;
> +    }
> +    vmap_tmp = (int *)malloc(sizeof(*vmap) * nr_vcpus);
> +    memset(vmap_tmp, 0, sizeof(*vmap) * nr_vcpus);
> +    for (toka = strtok_r(cfg, " ", &saveptra); toka;
> +        toka = strtok_r(NULL, " ", &saveptra)) {
> +        if (i >= nr_vcpus) goto vmap_parse_out;
> +            vmap_tmp[i] = strtoul(toka, &endptr, 10);
> +            if( endptr == toka) 
> +                goto vmap_parse_out;
> +            fprintf(stderr, "Parsed vcpu_to_vnode[%d] = %d.\n", i, 
> vmap_tmp[i]);
> +        i++;
> +    }
> +    memcpy(vmap, vmap_tmp, sizeof(*vmap) * nr_vcpus);
> +    rc = i;
> +vmap_parse_out:
> +    if (vmap_tmp != NULL) free(vmap_tmp);
> +    return rc;
> +}
> +
> +static int vnumamem_parse(char *vmemsizes, uint64_t *vmemregions, int 
> nr_vnodes)
> +{
> +    uint64_t memsize;
> +    char *endptr, *toka, *saveptr = NULL;
> +    int rc = 0;
> +    int j;
> +    rc = -EINVAL;
> +    if(vmemregions == NULL) {
> +        goto vmem_parse_out;
> +    }
> +    memsize = 0;
> +    j = 0;
> +    for (toka = strtok_r(vmemsizes, ",", &saveptr); toka;
> +        toka = strtok_r(NULL, ",", &saveptr)) {
> +        if ( j >= nr_vnodes ) 
> +            goto vmem_parse_out;
> +        memsize = strtoul(toka, &endptr, 10);
> +        if (endptr == toka) 
> +            goto vmem_parse_out;
> +        switch (*endptr) {
> +            case 'G':
> +            case 'g':
> +                memsize = memsize * 1024 * 1024 * 1024;
> +                break;
> +            case 'M':
> +            case 'm':
> +                memsize = memsize * 1024 * 1024;
> +                break;
> +            case 'K':
> +            case 'k':
> +                memsize = memsize * 1024 ;
> +                break;
> +            default:
> +                continue;
> +                break;
> +        }
> +        if (memsize > 0) {
> +            vmemregions[j] = memsize;
> +            j++;
> +        }
> +    }
> +    rc = j;
> +vmem_parse_out:   
> +    return rc;
> +}
> +#endif
>  static void parse_config_data(const char *config_source,
>                                const char *config_data,
>                                int config_len,
> @@ -871,7 +985,13 @@ static void parse_config_data(const char *config_source,
>      {
>          char *cmdline = NULL;
>          const char *root = NULL, *extra = "";
> -
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA        
> +        const char *vnumamemcfg = NULL;
> +        int nr_vnuma_regions;
> +        long unsigned int vnuma_memparsed = 0;
> +        const char *vmapcfg  = NULL;
> +        const char *vdistcfg = NULL;
> +#endif
>          xlu_cfg_replace_string (config, "kernel", &b_info->u.pv.kernel, 0);
>  
>          xlu_cfg_get_string (config, "root", &root, 0);
> @@ -888,7 +1008,82 @@ static void parse_config_data(const char *config_source,
>              fprintf(stderr, "Failed to allocate memory for cmdline\n");
>              exit(1);
>          }
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
> +        if (!xlu_cfg_get_long (config, "vnodes", &l, 0)) {
> +                b_info->nr_vnodes = l;
> +                if (b_info->nr_vnodes <= 0)
> +                    exit(1);
> +                if(!xlu_cfg_get_string (config, "vnumamem", &vnumamemcfg, 
> 0)) {
> +                        b_info->vnuma_memszs = calloc(b_info->nr_vnodes,
> +                                                    
> sizeof(*b_info->vnuma_memszs));
> +                        if (b_info->vnuma_memszs == NULL) {
> +                            fprintf(stderr, "WARNING: Could not allocate 
> vNUMA node memory sizes.\n");
> +                            exit(1);
> +                        }
> +                        char *buf2 = strdup(vnumamemcfg);
> +                        nr_vnuma_regions = vnumamem_parse(buf2, 
> b_info->vnuma_memszs,
> +                                                                
> b_info->nr_vnodes);
> +                        for(i = 0; i < b_info->nr_vnodes; i++)
> +                            vnuma_memparsed = vnuma_memparsed + 
> (b_info->vnuma_memszs[i] >> 10);
> +
> +                        if(vnuma_memparsed != b_info->max_memkb ||
> +                                nr_vnuma_regions != b_info->nr_vnodes )
> +                        {
> +                            fprintf(stderr, "WARNING: Incorrect vNUMA 
> config. Parsed memory = %lu, parsed nodes = %d, max = %lx\n", 
> +                                        vnuma_memparsed, nr_vnuma_regions, 
> b_info->max_memkb);
> +                            if(buf2) free(buf2);
> +                            exit(1);
> +                        }
> +                        if (buf2) free(buf2);
> +                }
> +                else 
> +                    b_info->nr_vnodes=0;
> +                if(!xlu_cfg_get_string(config, "vnuma_distance", &vdistcfg, 
> 0)) {
> +                    b_info->vdistance = (int *)calloc(b_info->nr_vnodes * 
> b_info->nr_vnodes, 
> +                                                        
> sizeof(*b_info->vdistance));
> +                    if (b_info->vdistance == NULL) 
> +                       exit(1);
> +                    char *buf2 = strdup(vdistcfg);
> +                    if(vdistance_parse(buf2, b_info->vdistance, 
> b_info->nr_vnodes) != b_info->nr_vnodes * b_info->nr_vnodes) {
> +                        if (buf2) free(buf2);
> +                        free(b_info->vdistance);
> +                        exit(1);
> +                    } 
> +                    if(buf2) free(buf2);
> +                }
> +                else 
> +                {
> +                    /* default distance */
> +                    b_info->vdistance = (int *)calloc(b_info->nr_vnodes * 
> b_info->nr_vnodes, sizeof(*b_info->vdistance));
> +                    if (b_info->vdistance == NULL)
> +                        exit(1);
> +                    for(i = 0; i < b_info->nr_vnodes; i++)
> +                        for(int j = 0; j < b_info->nr_vnodes; j++)
> +                            *(b_info->vdistance + j*b_info->nr_vnodes + i) = 
> (i == j ? 10 : 20);
>  
> +                }
> +                if(!xlu_cfg_get_string(config, "vcpu_to_vnode", &vmapcfg, 0))
> +                {
> +                    b_info->vcpu_to_vnode = (int *)calloc(b_info->max_vcpus, 
> sizeof(*b_info->vcpu_to_vnode));
> +                    if (b_info->vcpu_to_vnode == NULL) 
> +                       exit(-1);
> +                    char *buf2 = strdup(vmapcfg);
> +                    if (vcputovnode_parse(buf2, b_info->vcpu_to_vnode, 
> b_info->nr_vnodes, b_info->max_vcpus) < 0) {
> +                        if (buf2) free(buf2);
> +                        fprintf(stderr, "Error parsing vcpu to vnode 
> mask\n");
> +                        exit(1);
> +                    }
> +                    if(buf2) free(buf2);
> +                }
> +                else
> +                {
> +                    b_info->vcpu_to_vnode = (int *)calloc(b_info->max_vcpus, 
> sizeof(*b_info->vcpu_to_vnode));
> +                    if (b_info->vcpu_to_vnode != NULL)
> +                        libxl_default_vcpu_to_vnuma(b_info);
> +                }
> +        }
> +#endif        
> +        
>          xlu_cfg_replace_string (config, "bootloader", 
> &b_info->u.pv.bootloader, 0);
>          switch (xlu_cfg_get_list_as_string_list(config, "bootloader_args",
>                                        &b_info->u.pv.bootloader_args, 1))



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.