Xen project Mailing List

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Wed, 7 Jun 2017 11:19:38 -0700 (PDT)

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Wed, 07 Jun 2017 18:19:50 +0000

Dmarc-filter: OpenDMARC Filter v1.3.2 mail.kernel.org D055322CC1

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, 7 Jun 2017, Juergen Gross wrote: > On 06/06/17 21:08, Stefano Stabellini wrote: > > On Tue, 6 Jun 2017, Juergen Gross wrote: > >> On 06/06/17 18:39, Stefano Stabellini wrote: > >>> On Tue, 6 Jun 2017, Juergen Gross wrote: > >>>> On 26/05/17 21:01, Stefano Stabellini wrote: > >>>>> On Fri, 26 May 2017, Juergen Gross wrote: > >>>>>> On 26/05/17 18:19, Ian Jackson wrote: > >>>>>>> Juergen Gross writes ("HVM guest performance regression"): > >>>>>>>> Looking for the reason of a performance regression of HVM guests > >>>>>>>> under > >>>>>>>> Xen 4.7 against 4.5 I found the reason to be commit > >>>>>>>> c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove > >>>>>>>> freemem_slack") > >>>>>>>> in Xen 4.6. > >>>>>>>> > >>>>>>>> The problem occurred when dom0 had to be ballooned down when starting > >>>>>>>> the guest. The performance of some micro benchmarks dropped by about > >>>>>>>> a factor of 2 with above commit. > >>>>>>>> > >>>>>>>> Interesting point is that the performance of the guest will depend on > >>>>>>>> the amount of free memory being available at guest creation time. > >>>>>>>> When there was barely enough memory available for starting the guest > >>>>>>>> the performance will remain low even if memory is being freed later. > >>>>>>>> > >>>>>>>> I'd like to suggest we either revert the commit or have some other > >>>>>>>> mechanism to try to have some reserve free memory when starting a > >>>>>>>> domain. > >>>>>>> > >>>>>>> Oh, dear. The memory accounting swamp again. Clearly we are not > >>>>>>> going to drain that swamp now, but I don't like regressions. > >>>>>>> > >>>>>>> I am not opposed to reverting that commit. I was a bit iffy about it > >>>>>>> at the time; and according to the removal commit message, it was > >>>>>>> basically removed because it was a piece of cargo cult for which we > >>>>>>> had no justification in any of our records. > >>>>>>> > >>>>>>> Indeed I think fixing this is a candidate for 4.9. > >>>>>>> > >>>>>>> Do you know the mechanism by which the freemem slack helps ? I think > >>>>>>> that would be a prerequisite for reverting this. That way we can have > >>>>>>> an understanding of why we are doing things, rather than just > >>>>>>> flailing at random... > >>>>>> > >>>>>> I wish I would understand it. > >>>>>> > >>>>>> One candidate would be 2M/1G pages being possible with enough free > >>>>>> memory, but I haven't proofed this yet. I can have a try by disabling > >>>>>> big pages in the hypervisor. > >>>>> > >>>>> Right, if I had to bet, I would put my money on superpages shattering > >>>>> being the cause of the problem. > >>>> > >>>> Seems you would have lost your money... > >>>> > >>>> Meanwhile I've found a way to get the "good" performance in the micro > >>>> benchmark. Unfortunately this requires to switch off the pv interfaces > >>>> in the HVM guest via "xen_nopv" kernel boot parameter. > >>>> > >>>> I have verified that pv spinlocks are not to blame (via "xen_nopvspin" > >>>> kernel boot parameter). Switching to clocksource TSC in the running > >>>> system doesn't help either. > >>> > >>> What about xen_hvm_exit_mmap (an optimization for shadow pagetables) and > >>> xen_hvm_smp_init (PV IPI)? > >> > >> xen_hvm_exit_mmap isn't active (kernel message telling me so was > >> issued). > >> > >>>> Unfortunately the kernel seems no longer to be functional when I try to > >>>> tweak it not to use the PVHVM enhancements. > >>> > >>> I guess you are not talking about regular PV drivers like netfront and > >>> blkfront, right? > >> > >> The plan was to be able to use PV drivers without having to use PV > >> callbacks and PV timers. This isn't possible right now. > > > > I think the code to handle that scenario was gradually removed over time > > to simplify the code base. > > Hmm, too bad. > > >>>> I'm wondering now whether > >>>> there have ever been any benchmarks to proof PVHVM really being faster > >>>> than non-PVHVM? My findings seem to suggest there might be a huge > >>>> performance gap with PVHVM. OTOH this might depend on hardware and other > >>>> factors. > >>>> > >>>> Stefano, didn't you do the PVHVM stuff back in 2010? Do you have any > >>>> data from then regarding performance figures? > >>> > >>> Yes, I still have these slides: > >>> > >>> https://www.slideshare.net/xen_com_mgr/linux-pv-on-hvm > >> > >> Thanks. So you measured the overall package, not the single items like > >> callbacks, timers, time source? I'm asking because I start to believe > >> there are some of those slower than their non-PV variants. > > > > There isn't much left in terms of individual optimizations: you already > > tried switching clocksource and removing pv spinlocks. xen_hvm_exit_mmap > > is not used. Only the following are left (you might want to double check > > I haven't missed anything): > > > > 1) PV IPI > > Its a 1 vcpu guest. > > > 2) PV suspend/resume > > 3) vector callback > > 4) interrupt remapping > > > > 2) is not on the hot path. > > I did individual measurements of 3) at some points and it was a clear win. > > That might depend on the hardware. Could it be newer processors are > faster here? I don't think so: the alternative it's an emulated interrupt. It's slower under all points of view. I would try to run the test with xen_emul_unplug="never" which means that you are going to end up using the emulated network card and emulated IDE controller, but some of the other optimizations (like the vector callback) will still be active. If the cause of the problem is ballooning for example, using emulated interfaces for IO will reduce the amount of ballooned out pages significantly. > > Slide 14 shows the individual measurements of 4) > > I don't think this is affecting my benchmark. It is just munmap after > all. I agree. > > Only 1) is left to check as far as I can tell. > > No IPIs should be involved. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.