Xen project Mailing List

Re: [Xen-users] query memory allocation per NUMA node

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, Kun Cheng <chengkunck@xxxxxxxxx>, xen-users@xxxxxxxxxxxxx

From: Eike Waldt <waldt@xxxxxxxxxxxxx>

Date: Mon, 16 Jan 2017 13:18:06 +0100

Delivery-date: Mon, 16 Jan 2017 12:19:25 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

On 01/12/2017 01:45 AM, Dario Faggioli wrote: > On Mon, 2017-01-09 at 15:47 +0100, Eike Waldt wrote: >> On 01/09/2017 03:01 PM, Kun Cheng wrote: >>> First numa placement tries to allocate as much as (in most cases >>> Xen >>> will find a node which can fit the VM's memory requirement) memory >>> to >>> local nodes (let's say 4 vcpus are pinned to node 0 then it's a >>> local >>> node), but it seems xen doesn't care how much memory has been >>> allocated >>> to a certain VM under such situations (as it tries to allocate as >>> much >>> as possible on one node, assuming if a VM's VCPUs are spread among >>> several nodes, rare but possible). As having 800MB on node 0 is >>> pretty >>> much the same as 900MB on node 0 if your VM requires 1GB, both will >>> have >>> a similar performance impact on your VM. >> >> Xen has to have a mechanism to get to know which NUMA-Node is >> most-empty/preferred then. >> > Indeed it has. > >> I even read about different "NUMA placement policies" in [1], but >> didn't >> find a way to set them. >> > No, if you check out the wiki page, it says that different policies > where tried during development. Then, the final solution is based on > what worked best. > >> A command line parameter for "xl" is what I'm looking here for. >> A handy alternative to "xl debug-keys u; xl dmesg"... >> > Exactly, and sorry again it's not there yet. :-( > >>> Second, a VM can be migrated to other nodes due to load balancing, >>> which >>> may makes it harder to count how much memory has been allocated for >>> a >>> certain VM on each node. >> >> Why should it be harder to count then? "xl debug-keys u; xm dmesg" >> does >> already give me this information (but you cannot really parse this or >> execute this periodically). >> > In fact, it's not any harder than that. > >> When I understood it correctly, xen decides on which NUMA Node the >> DomU >> shall run and allocates the needed memory...After that it does a >> "soft-pinning" of the DomU's vCPUs to pCPUs (at least that is what i >> observed on my test systems). >> > Correct. > >> Only doing soft-pinning is way worse for the overall performance, as >> hard-pinning (according to my first tests). >> > Can you elaborate on this? I'm curious (what tests, what does the > numbers look like in the two cases, etc). > setup: - 144 vCPUs on a server with 4 NUMA Nodes - pinning Dom0 CPUs (0-15) - running 60 DomUs (40 Linux (para), 20 Windows (HVM)) - doing 2/3 CPU load with stressaptest(CPU,RAM) and one fio(write I/O) thread in all linux VMs soft-pinning whole NUMA nodes per DomU (depending on NUMA Node memory placement): The load on Dom0 is about 200, the i/o wait is about 30 and the cpu steal time for each vCPU in Dom0 is about 50! Dom0 and DomUs respond very slow. hard-pinning whole NUMA nodes per DomU (depending on NUMA Node memory placement): The load on Dom0 is about 90, the i/o wait is about 30 and the cpu steal time is about 2! Dom0 and DomUs respond ok. This simple test tells me, that soft-pinning is way worse than hard-pinning. It may be a corner case though and nobody might ever tested it in this "dimension" ;) >> But to do hard-pinning the correct way I need to know on which >> NUMA-nodes the DomU runs...Otherwise performance will be impacted >> again. >> > Right. What you can do is to convert soft-affinity into hard-affinity > after the domain is created with `xl vcpu-pin'. > > I mean: > 1) create the domain > 2) you find out it's soft-pinned to node:1 (with `xl vcpu-list') > 3) you do `xl vcpu-pin <domid> all node:1 all > > And you end up with a domain with hard-affinity set to the node on > which its memory resides. > > I appreciate this is also tricky. Maybe we can add a new config option > (or an xl.conf key) to let the user specify whether they want hard or > soft affinity to be used for placement. > >> As I cannot change on which NUMA-node the DomU is started (unless I >> specify pCPUs to the DomU's config [which would require something >> "intelligent" to figure out which Node/CPUs to know]), I have to do >> it >> this way around, or am I getting it totally wrong? >> > You're getting it almost right, you only probably did not realize that > doing the hard-pinning afterwords would just work. :-) > >>> However, if you just want to know the memory usage on each node, >>> perhaps >>> you could try numactl and get some outputs? Or try libvirt? I >>> remember >>> numastat can give some intel about memory usage on each node. >> >> As far as I understand numactl/numastat will not work in Dom0. >> > And always will be. > > Regards, > Dario > -- Eike Waldt Linux Consultant Tel.: +49-175-7241189 Mail: waldt@xxxxxxxxxxxxx B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx https://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.