Xen project Mailing List

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Juergen Gross <jgross@xxxxxxxx>

Date: Mon, 29 May 2017 21:05:02 +0200

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Mon, 29 May 2017 19:05:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 26/05/17 21:01, Stefano Stabellini wrote: > On Fri, 26 May 2017, Juergen Gross wrote: >> On 26/05/17 18:19, Ian Jackson wrote: >>> Juergen Gross writes ("HVM guest performance regression"): >>>> Looking for the reason of a performance regression of HVM guests under >>>> Xen 4.7 against 4.5 I found the reason to be commit >>>> c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove freemem_slack") >>>> in Xen 4.6. >>>> >>>> The problem occurred when dom0 had to be ballooned down when starting >>>> the guest. The performance of some micro benchmarks dropped by about >>>> a factor of 2 with above commit. >>>> >>>> Interesting point is that the performance of the guest will depend on >>>> the amount of free memory being available at guest creation time. >>>> When there was barely enough memory available for starting the guest >>>> the performance will remain low even if memory is being freed later. >>>> >>>> I'd like to suggest we either revert the commit or have some other >>>> mechanism to try to have some reserve free memory when starting a >>>> domain. >>> >>> Oh, dear. The memory accounting swamp again. Clearly we are not >>> going to drain that swamp now, but I don't like regressions. >>> >>> I am not opposed to reverting that commit. I was a bit iffy about it >>> at the time; and according to the removal commit message, it was >>> basically removed because it was a piece of cargo cult for which we >>> had no justification in any of our records. >>> >>> Indeed I think fixing this is a candidate for 4.9. >>> >>> Do you know the mechanism by which the freemem slack helps ? I think >>> that would be a prerequisite for reverting this. That way we can have >>> an understanding of why we are doing things, rather than just >>> flailing at random... >> >> I wish I would understand it. >> >> One candidate would be 2M/1G pages being possible with enough free >> memory, but I haven't proofed this yet. I can have a try by disabling >> big pages in the hypervisor. > > Right, if I had to bet, I would put my money on superpages shattering > being the cause of the problem. Creating the domains with xl -vvv create ... showed the numbers of superpages and normal pages allocated for the domain. The following allocation pattern resulted in a slow domain: xc: detail: PHYSICAL MEMORY ALLOCATION: xc: detail: 4KB PAGES: 0x0000000000000600 xc: detail: 2MB PAGES: 0x00000000000003f9 xc: detail: 1GB PAGES: 0x0000000000000000 And this one was fast: xc: detail: PHYSICAL MEMORY ALLOCATION: xc: detail: 4KB PAGES: 0x0000000000000400 xc: detail: 2MB PAGES: 0x00000000000003fa xc: detail: 1GB PAGES: 0x0000000000000000 I ballooned dom0 down in small steps to be able to create those test cases. I believe the main reason is that some data needed by the benchmark is located near the end of domain memory resulting in a rather high TLB miss rate in case of not all (or nearly all) memory available in form of 2MB pages. >> What makes the whole problem even more mysterious is that the >> regression was detected first with SLE12 SP3 (guest and dom0, Xen 4.9 >> and Linux 4.4) against older systems (guest and dom0). While trying >> to find out whether the guest or the Xen version are the culprit I >> found that the old guest (based on kernel 3.12) showed the mentioned >> performance drop with above commit. The new guest (based on kernel >> 4.4) shows the same bad performance regardless of the Xen version or >> amount of free memory. I haven't found the Linux kernel commit yet >> being responsible for that performance drop. And this might be result of a different memory usage of more recent kernels: I suspect the critical data is now at the very end of the domain's memory. As there are always some pages allocated in 4kB chunks the last pages of the domain will never be part of a 2MB page. Looking at meminit_hvm() in libxc doing the allocation of the memory I realized it is kind of sub-optimal: shouldn't it try to allocate the largest pages first and the smaller pages later? Would it be possible to make memory holes larger sometimes to avoid having to use 4kB pages (with the exception of the first 2MB of the domain, of course)? Maybe it would even make sense to be able to tweak the allocation pattern depending on the guest type: preferring large pages either at the top or at the bottom of the domain's physical address space. And what should be done with the "freemem_slack" patch? With my findings I don't think we can define a fixed percentage of the memory which should be free. I could imagine some kind of mechanism using dom0 ballooning more dynamically: As long as enough memory is unused in dom0 balloon it down in case of an allocation failure of a large page (1GB or 2MB). After all memory for the new domain has been allocated balloon dom0 up again (but not more than before starting creation of the new domain, of course). Thoughts? Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.