[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/domain_page: implement pure per-vCPU mapping infrastructure



On Tue, 2020-02-04 at 14:17 +0000, Andrew Cooper wrote:
> On 03/02/2020 18:36, Hongyan Xia wrote:
> > Rewrite the mapcache to be purely per-vCPU instead of partly per-
> > vCPU
> > and partly per-domain.
> > 
> > This patch is needed to address performance issues when we start
> > relying
> > on the mapcache, e.g., when we do not have a direct map. Currently,
> > the
> > per-domain lock on the mapcache is a bottleneck for multicore,
> > causing
> > performance degradation and even functional regressions.
> 
> Do you mean that this patch causes a regression, or that removing the
> directmap causes a regression?
> 
> The rest of the commit message is very confusing to follow.

Once the direct map is gone, using the existing mapcache implementation
in map_domain_page causes these problems. Even if the direct map is
still there, currently some guests on debug build rely on the mapcache,
which will see similar problems when the vCPU count is high.

I can reword the commit message to make it clearer.

> >  This patch
> > makes the mapping structure per-vCPU and completely lockless.
> > 
> > Functional regression:
> > 
> > When a domain is run on more than 64 cores, FreeBSD 10 panicks
> > frequently
> > due to occasional simultaneous set_singleshot_timer hypercalls from
> > too
> > many cores. Some cores will be blocked waiting on map_domain_page,
> > eventually failing to set a timer in the future. FreeBSD cannot
> > handle
> > this and panicks. This was fixed in later FreeBSD releases by
> > handling
> > -ETIME, but still the degradation in timer performance is a big
> > issue.
> > 
> > Performance regression:
> > 
> > Many benchmarks see a performance drop when having a large core
> > count.
> > I have done a Geekbench on a 32-vCPU guest.
> > 
> > perf drop     old        new
> > single       0.04%      0.18%
> > multi        2.43%      0.08%
> > 
> > Removing the per-domain lock in the mapcache brings the multi-core
> > performance almost identical to using the direct map for mappings.
> > 
> > There should be room for futher optimisations, but this already
> > improves over the old mapcache by a lot.
> > 
> > Note that entries in the maphash will occupy inuse slots. With 16
> > slots
> > per vCPU and a maphash capacity of 8, we only have another 8
> > available,
> > which is not enough for nested page table walks. We need to
> > increase the
> > number of slots in config.h.
> 
> I'm afraid that I don't follow what you're trying to say here.  The
> number of slots should either be fine, or we've got a pre-exiting
> bug.

The mapcache design is now different. The slots now have to include
spaces for the maphash, which was not the case before.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.