[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 00/17] Alternative Meltdown mitigation
On 12/02/18 18:54, Dario Faggioli wrote: > On Fri, 2018-02-09 at 15:01 +0100, Juergen Gross wrote: >> This series is available via github: >> >> https://github.com/jgross1/xen.git xpti >> >> Dario wants to do some performance tests for this series to compare >> performance with Jan's series with all optimizations posted. >> > And some of this is indeed ready. > > So, this is again on my testbox, with 16 pCPUs and 12GB of RAM, and I > used a guest with 16 vCPUs and 10GB of RAM. > > I benchmarked Jan's patch *plus* all the optimizations and overhead > mitigation patches he posted on xen-devel (the ones that are already in > staging, and also the ones that are not yet there). That's "XPTI-Light" > in the table and in the graphs. Booting this with 'xpti=false' is > considered the baseline, while booting with 'xpti=true' is the actual > thing we want to measure. :-) > > Then I ran the same benchmarks on Juergen's branch above, enabled at > boot. That's "XPYI" in the table and graphs (yes, I know, sorry for the > typo!). > > http://openbenchmarking.org/result/1802125-DARI-180211144 > http://openbenchmarking.org/result/1802125-DARI-180211144&obr_hgv=XPTI-Light+xpti%3Dfalse&obr_nor=y&obr_hgv=XPTI-Light+xpti%3Dfalse ... > Or, actually, that's not it! :-O In fact, right while I was writing > this report, it came out on IRC that something can be done, on > Juergen's XPTI series, to mitigate the performance impact a bit. > > Juergen sent me a patch already, and I'm re-running the benchmarks with > that applied. I'll let know how the results ends up looking like. It turned out the results are not basically different. So the general problem with context switches is still there (which I expected, BTW). So I guess the really bad results with benchmarks triggering a lot of vcpu scheduling show that my approach isn't going to fly, as the most probable cause for the slow context switches are the introduced serializing instructions (LTR, WRMSRs) which can't be avoided when we want to use per-vcpu stacks. OTOH the results of the other benchmarks showing some advantage over Jan's solution indicate there is indeed an aspect which can be improved. Instead of preferring one approach over the other I have thought about a way to use the best parts of each solution in a combined variant. In case nobody is feeling strong to pursue my current approach further I'd like to suggest the following scheme: - Whenever a L4 page table of the guest is in use on one physical cpu only use the L4 shadow cache of my series in order to avoid having to copy the L4 contents each time the hypervisor is left. - As soon as a L4 page table is being activated on a second cpu fall back to use the per-cpu page table on that cpu (the cpu already using the L4 page table can continue doing so). - Before activation of a L4 shadow page table it is modified to map the per-cpu data needed in guest mode for the local cpu only. - Use INVPCID instead of %cr4 PGE toggling to speed up purging global TLB entries (depending on the availability of the feature, of course). - Use the PCID feature for being able to avoid purging TLB entries which might be needed later (depending on hardware again). I expect this will help especially for cases where the guest often switches between kernel and user mode. Whether we want 3 or 4 PCID values for each guest address space has to be discussed: do we need 2 different Xen variants for guest user and guest kernel (IOW: are there any problems possible when the hypervisor is using a guest kernel's permission to access guest data when the guest was running in user mode before entering the hypervisor)? Thoughts? Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |