Re: [Xen-users] query memory allocation per NUMA node

On Mon, 2017-01-09 at 15:47 +0100, Eike Waldt wrote:
> On 01/09/2017 03:01 PM, Kun Cheng wrote:
> > First numa placement tries to allocate as much as (in most cases
> > Xen
> > will find a node which can fit the VM's memory requirement) memory
> > to
> > local nodes (let's say 4 vcpus are pinned to node 0 then it's a
> > local
> > node), but it seems xen doesn't care how much memory has been
> > allocated
> > to a certain VM under such situations (as it tries to allocate as
> > much
> > as possible on one node, assuming if a VM's VCPUs are spread among
> > several nodes, rare but possible). As having 800MB on node 0 is
> > pretty
> > much the same as 900MB on node 0 if your VM requires 1GB, both will
> > have
> > a similar performance impact on your VM.
> Xen has to have a mechanism to get to know which NUMA-Node is
> most-empty/preferred then.
Indeed it has.

> I even read about different "NUMA placement policies" in [1], but
> didn't
> find a way to set them.
No, if you check out the wiki page, it says that different policies
where tried during development. Then, the final solution is based on
what worked best.

> A command line parameter for "xl" is what I'm looking here for.
> A handy alternative to "xl debug-keys u; xl dmesg"...
Exactly, and sorry again it's not there yet. :-(

> > Second, a VM can be migrated to other nodes due to load balancing,
> > which
> > may makes it harder to count how much memory has been allocated for
> > a
> > certain VM on each node.
> Why should it be harder to count then? "xl debug-keys u; xm dmesg"
> does
> already give me this information (but you cannot really parse this or
> execute this periodically).
In fact, it's not any harder than that.

> When I understood it correctly, xen decides on which NUMA Node the
> DomU
> shall run and allocates the needed memory...After that it does a
> "soft-pinning" of the DomU's vCPUs to pCPUs (at least that is what i
> observed on my test systems).

> Only doing soft-pinning is way worse for the overall performance, as
> hard-pinning (according to my first tests).
Can you elaborate on this? I'm curious (what tests, what does the
numbers look like in the two cases, etc).

> But to do hard-pinning the correct way I need to know on which
> NUMA-nodes the DomU runs...Otherwise performance will be impacted
> again.
Right. What you can do is to convert soft-affinity into hard-affinity
after the domain is created with `xl vcpu-pin'.

I mean:
1) create the domain
2) you find out it's soft-pinned to node:1 (with `xl vcpu-list')
3) you do `xl vcpu-pin <domid> all node:1 all

And you end up with a domain with hard-affinity set to the node on
which its memory resides.

I appreciate this is also tricky. Maybe we can add a new config option
(or an xl.conf key) to let the user specify whether they want hard or
soft affinity to be used for placement.

> As I cannot change on which NUMA-node the DomU is started (unless I
> specify pCPUs to the DomU's config [which would require something
> "intelligent" to figure out which Node/CPUs to know]), I have to do
> it
> this way around, or am I getting it totally wrong?
You're getting it almost right, you only probably did not realize that
doing the hard-pinning afterwords would just work. :-)

> > However, if you just want to know the memory usage on each node,
> > perhaps
> > you could try numactl and get some outputs? Or try libvirt? I
> > remember
> > numastat can give some intel about memory usage on each node.
> As far as I understand numactl/numastat will not work in Dom0.
And always will be.

<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

