[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] query memory allocation per NUMA node




On 01/18/2017 07:25 PM, Dario Faggioli wrote:
> On Wed, 2017-01-18 at 17:36 +0100, Eike Waldt wrote:
>> On 01/17/2017 12:23 AM, Dario Faggioli wrote:
>>> That may well be. But it sounds strange. I'd be inclined to think
>>> that
>>> there is something else going on.. Or maybe I'm just not
>>> understanding
>>> what you mean with "pinning while NUMA nodes per DomU" (and that's
>>> why
>>> I'm asking for the commands output :-)).
>>>
>> I simply mean that you always pin ALL DomU vCPUs to a whole NUMA node
>> (or more) and not single vCPUs.
>>
> Ok, I understand it now (and I also see it in the output you sent me).
> Yes, this is usually what makes the most sense to do.
> 
>> One detail to mention would be, that we run all DomU filesystems on
>> NFS
>> storage mounted on the Dom0.
>> Another interesting fact is, that (as said above) we're doing some
>> fio
>> write tests. These go to NFS filesystems and the write speed is about
>> 1000 MB/s (8000 Mbit/s) in the hard-pinning scenario and only 100
>> MB/s
>> in the soft-pinning scenario.
>>
> Mmm... ok, it's indeed interesting. But I can't really tell, out of the
> top of my head, what kind of relationship/interaction this may have
> with hard vs soft pinning.
> 
>> I'll send you some outputs.
>>
> Thanks. Looking at it.
> 
> You really have a lot of domains! :-D
> 
> So, in the hard pinning case, you totally isolate dom0, and it
> therefore makes sense that you see ~0% steal time from inside it.
> 
> In the soft pinning case, you actually don't isolate it. In fact,
> although they'll try not to, the various DomU are allowed to run on
> pCPUs 0-15, while, OTOH, dom0 is _not_allowed_ to run on 16-143.
> 
> That's a bit unfair, and I think justifies the (very!) high steal time.
> 
> A more fair comparison between hard and soft pinning may be, either:
> 
> 1) use soft-affinity for dom0 too. I.e., as far as dom0 is concerned,
> output of `xl vcpu-list' should look as follows:
> 
> Name                                ID  VCPU   CPU State   Time(s) Affinity 
> (Hard / Soft)
> Domain-0                             0     0    0   -b-     245.0  all / 0-15
> Domain-0                             0     1    1   -b-      66.1  all / 0-15
> Domain-0                             0     2    2   -b-     102.8  all / 0-15
> Domain-0                             0     3    3   -b-      59.2  all / 0-15
> Domain-0                             0     4    4   -b-     197.7  all / 0-15
> Domain-0                             0     5    5   -b-      50.8  all / 0-15
> Domain-0                             0     6    6   -b-      97.3  all / 0-15
> Domain-0                             0     7    7   -b-      42.1  all / 0-15
> Domain-0                             0     8    8   -b-      95.1  all / 0-15
> Domain-0                             0     9    9   -b-      31.3  all / 0-15
> Domain-0                             0    10   10   r--      96.4  all / 0-15
> Domain-0                             0    11   11   -b-      33.0  all / 0-15
> Domain-0                             0    12   12   r--     101.3  all / 0-15
> Domain-0                             0    13   13   r--      30.1  all / 0-15
> Domain-0                             0    14   14   -b-     100.9  all / 0-15
> Domain-0                             0    15   15   -b-      39.4  all / 0-15
> 
> To achieve this, I think you should get rid of dom0_vcpus_pin, keep
> dom0_max_vcpus=16 and add dom0_nodes=0,relaxed (or something like
> that). This will probably set the vcpu-affinity of dom0 to 'all/0-35',
> which you can change to 'all/0-15' after boot.
I got rid of "dom0_vcpus_pin" and did some tests...
all/0-15 or 0-15/all or all/all for Dom0 does not make a difference
according to my tests in the soft-pinning case.
I suppose that is because the CPUs 0-15 are assigned anyhow.

The "dom0_nodes=0,relaxed"...
Checked it out and it does exactly what you (and the manpage) said:
relaxed --> all / 0-35
strict  --> 0-35 / 0-35

Interesting is, that "xl debug-keys u; xl dmesg" still shows memory
pages for NUMA Node3 even though it says in the manpage "dom0_nodes [..]
Defaults for vCPU-s created and memory assigned to Dom0 [..]."
There have to be enough free pages on Node0 (there is no other DomU
running directly after startup).

> 
> 2) properly isolate dom0, even in the soft-affinity case. That would
> mean keeping dom0 affinity as you already have it, but change **all**
> the other domains' affinity from 'all/xx-yy' (where xx and yy vary from
> domain to domain) to '16-143/xx-yy'.
That was a very good hint!
I did not realize that before, thank you so much!
The "issues" with stealing and bad NFS performance are gone now.

> 
> Let me say again that I'm not at all saying that I'm sure that either 1
> or 2 will certainly perform better than the hard pinning case. This is
> impossible to tell without trying.
> 
> But, like this, it's a more fair --and hence more interesting--
> comparison, and IMO it's worth a try.
> 
When I isolate the Dom0 properly in the soft-pinning scenario, compared
to hard-pinning everything, I could not see any performance differences.
But this is very hard to measure I think.

> Another thing, what Xen version is it that you're using again? I'm
> asking because I fixed a bug in Credit1's soft-affinity logic, during
> the Xen 4.8 development cycle (as in, you may be subject to it, if not
> on 4.8).
> 
> Check that out here:
> https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02184.html
> 
> (it's commit f83fc393b "xen: credit1: fix mask to be used for tickling
> in Credit1") in Xen's git repo.)
> 
> Checking stable releases, I'm able to find it in Xen 4.7.1, and in
> Xen 4.6.4, so these versions are also ok.
> 
> If you're not in either 4.8, 4.7.1 or 4.6.4, I'd recommend upgrading to
> any of those, but I understand that is not always be super-
> straightfowrard! :-P
As you may have noticed fomr "xl info" we have a SLES12-SP2 here.
They call it "4.7.1_02-25".
I just checked the sources and the fix seems to be included.

> 
> Regards,
> Dario
> 

-- 
Eike Waldt
Linux Consultant
Tel.: +49-175-7241189
Mail: waldt@xxxxxxxxxxxxx

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.