[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] query memory allocation per NUMA node

On 01/12/2017 01:45 AM, Dario Faggioli wrote:
> On Mon, 2017-01-09 at 15:47 +0100, Eike Waldt wrote:
>> On 01/09/2017 03:01 PM, Kun Cheng wrote:
>>> First numa placement tries to allocate as much as (in most cases
>>> Xen
>>> will find a node which can fit the VM's memory requirement) memory
>>> to
>>> local nodes (let's say 4 vcpus are pinned to node 0 then it's a
>>> local
>>> node), but it seems xen doesn't care how much memory has been
>>> allocated
>>> to a certain VM under such situations (as it tries to allocate as
>>> much
>>> as possible on one node, assuming if a VM's VCPUs are spread among
>>> several nodes, rare but possible). As having 800MB on node 0 is
>>> pretty
>>> much the same as 900MB on node 0 if your VM requires 1GB, both will
>>> have
>>> a similar performance impact on your VM.
>> Xen has to have a mechanism to get to know which NUMA-Node is
>> most-empty/preferred then.
> Indeed it has.
>> I even read about different "NUMA placement policies" in [1], but
>> didn't
>> find a way to set them.
> No, if you check out the wiki page, it says that different policies
> where tried during development. Then, the final solution is based on
> what worked best.
>> A command line parameter for "xl" is what I'm looking here for.
>> A handy alternative to "xl debug-keys u; xl dmesg"...
> Exactly, and sorry again it's not there yet. :-(
>>> Second, a VM can be migrated to other nodes due to load balancing,
>>> which
>>> may makes it harder to count how much memory has been allocated for
>>> a
>>> certain VM on each node.
>> Why should it be harder to count then? "xl debug-keys u; xm dmesg"
>> does
>> already give me this information (but you cannot really parse this or
>> execute this periodically).
> In fact, it's not any harder than that.
>> When I understood it correctly, xen decides on which NUMA Node the
>> DomU
>> shall run and allocates the needed memory...After that it does a
>> "soft-pinning" of the DomU's vCPUs to pCPUs (at least that is what i
>> observed on my test systems).
> Correct.
>> Only doing soft-pinning is way worse for the overall performance, as
>> hard-pinning (according to my first tests).
> Can you elaborate on this? I'm curious (what tests, what does the
> numbers look like in the two cases, etc).
- 144 vCPUs on a server with 4 NUMA Nodes
- pinning Dom0 CPUs (0-15)
- running 60 DomUs (40 Linux (para), 20 Windows (HVM))
- doing 2/3 CPU load with stressaptest(CPU,RAM) and one fio(write I/O)
thread in all linux VMs

soft-pinning whole NUMA nodes per DomU (depending on NUMA Node memory
The load on Dom0 is about 200,
the i/o wait is about 30 and
the cpu steal time for each vCPU in Dom0 is about 50!
Dom0 and DomUs respond very slow.

hard-pinning whole NUMA nodes per DomU (depending on NUMA Node memory
The load on Dom0 is about 90,
the i/o wait is about 30 and
the cpu steal time is about 2!
Dom0 and DomUs respond ok.

This simple test tells me, that soft-pinning is way worse than hard-pinning.
It may be a corner case though and nobody might ever tested it in this
"dimension" ;)

>> But to do hard-pinning the correct way I need to know on which
>> NUMA-nodes the DomU runs...Otherwise performance will be impacted
>> again.
> Right. What you can do is to convert soft-affinity into hard-affinity
> after the domain is created with `xl vcpu-pin'.
> I mean:
> 1) create the domain
> 2) you find out it's soft-pinned to node:1 (with `xl vcpu-list')
> 3) you do `xl vcpu-pin <domid> all node:1 all
> And you end up with a domain with hard-affinity set to the node on
> which its memory resides.
> I appreciate this is also tricky. Maybe we can add a new config option
> (or an xl.conf key) to let the user specify whether they want hard or
> soft affinity to be used for placement.
>> As I cannot change on which NUMA-node the DomU is started (unless I
>> specify pCPUs to the DomU's config [which would require something
>> "intelligent" to figure out which Node/CPUs to know]), I have to do
>> it
>> this way around, or am I getting it totally wrong?
> You're getting it almost right, you only probably did not realize that
> doing the hard-pinning afterwords would just work. :-)
>>> However, if you just want to know the memory usage on each node,
>>> perhaps
>>> you could try numactl and get some outputs? Or try libvirt? I
>>> remember
>>> numastat can give some intel about memory usage on each node.
>> As far as I understand numactl/numastat will not work in Dom0.
> And always will be.
> Regards,
> Dario

Eike Waldt
Linux Consultant
Tel.: +49-175-7241189
Mail: waldt@xxxxxxxxxxxxx

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

Attachment: signature.asc
Description: OpenPGP digital signature

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.