|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/domain_page: implement pure per-vCPU mapping infrastructure
On Thu, Feb 06, 2020 at 06:58:23PM +0000, Hongyan Xia wrote:
> Rewrite the mapcache to be purely per-vCPU instead of partially per-vcpu
> and partially per-domain.
>
> The existing mapcache implementation keeps per-vcpu hash tables for
> caching, but also creates a per-domain region which is shared by all
> vcpus and protected by a per-domain lock. When the vcpu count of a
> domain is large, the per-domain lock contention is high. Also, several
> data structures are shared among all vcpus, potentially creating serious
> cache ping-pong effects. Performance degradation can be seen on domains
> with >16 vcpus.
>
> This patch is needed to address these issues when we start relying on
> the mapcache (e.g., when we do not have a direct map). It partitions the
> per-domain region into per-vcpu regions so that no mapcache state is
> shared among vcpus. As a result, no locking is required and a much
> simpler maphash design can be used. The performance improvement after
> this patch is quite noticeable.
>
> Benchmarks
> (baseline uses direct map instead of the mapcache in map_domain_page,
> old is the existing mapcache and new is after this patch):
>
> Geekbench on a 32-vCPU guest,
>
> performance drop old new
> single core 0.04% 0.18%
Do you know why the new mapcache has more overhead than the old one in
the single core case?
> multi core 2.43% 0.08%
>
> fio on a 32-vCPU guest, LVM atop 8*SSD,
>
> performance drop old new
> 3.05% 0.28%
>
> There should be room for futher optimisations, but this already
> improves over the old mapcache by a lot.
>
> In the new cache design, maphash entries themselves also occupy inuse
> slots. The existing configuration of 16 slots for each vcpu is no longer
> sufficient to include both the maphash and a nested page table walk.
> Fortunately, the per-domain inuse and garbage bit vectors are removed
> from the PERDOMAIN region, giving us enough room to expand the available
> mapping slots.
>
> Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx>
[...]
> void unmap_domain_page(const void *ptr)
> {
> - unsigned int idx;
> + unsigned int idx, glb_idx;
> struct vcpu *v;
> - struct mapcache_domain *dcache;
> - unsigned long va = (unsigned long)ptr, mfn, flags;
> - struct vcpu_maphash_entry *hashent;
> + struct mapcache_vcpu *vcache;
> + unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags;
>
> if ( va >= DIRECTMAP_VIRT_START )
> return;
> @@ -189,114 +144,78 @@ void unmap_domain_page(const void *ptr)
> v = mapcache_current_vcpu();
> ASSERT(v && is_pv_vcpu(v));
>
> - dcache = &v->domain->arch.pv.mapcache;
> - ASSERT(dcache->inuse);
> + vcache = v->arch.pv.mapcache;
> + ASSERT(vcache);
>
> - idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
> - mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx));
> - hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)];
Please also assert the va in question is within the range allocated to
this vcpu.
> + glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
> + idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES;
>
> local_irq_save(flags);
>
> - if ( hashent->idx == idx )
> - {
> - ASSERT(hashent->mfn == mfn);
> - ASSERT(hashent->refcnt);
> - hashent->refcnt--;
> - }
> - else if ( !hashent->refcnt )
> - {
> - if ( hashent->idx != MAPHASHENT_NOTINUSE )
> - {
> - /* /First/, zap the PTE. */
> - ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(hashent->idx)) ==
> - hashent->mfn);
> - l1e_write(&MAPCACHE_L1ENT(hashent->idx), l1e_empty());
> - /* /Second/, mark as garbage. */
> - set_bit(hashent->idx, dcache->garbage);
> - }
> -
> - /* Add newly-freed mapping to the maphash. */
> - hashent->mfn = mfn;
> - hashent->idx = idx;
> - }
> - else
> - {
> - /* /First/, zap the PTE. */
> - l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty());
> - /* /Second/, mark as garbage. */
> - set_bit(idx, dcache->garbage);
> - }
> + mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx));
> + hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)];
> +
> + vcache->refcnt[idx]--;
> + /* If refcnt drops to 0, we only clear inuse when it's not in the
> maphash. */
It would be clearer to "when it has been evicted from maphash" to match
the terminology in the map routine.
> + if ( hashmfn != mfn && !vcache->refcnt[idx] )
> + __clear_bit(idx, vcache->inuse);
Also, please flush the linear address here and the other __clear_bit
location.
>
> local_irq_restore(flags);
> }
>
> -int mapcache_domain_init(struct domain *d)
> +int mapcache_vcpu_init(struct vcpu *v)
> {
> + struct domain *d = v->domain;
> struct mapcache_domain *dcache = &d->arch.pv.mapcache;
> - unsigned int bitmap_pages;
> -
> - ASSERT(is_pv_domain(d));
> + unsigned long i;
Since you're changing this anyway -- we normally use unsigned int as
index type.
> + unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
>
> #ifdef NDEBUG
> if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1))
> )
> return 0;
> #endif
>
> - BUILD_BUG_ON(MAPCACHE_VIRT_END + PAGE_SIZE * (3 +
> - 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long)))
> >
> - MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20));
> - bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long));
> - dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE;
> - dcache->garbage = dcache->inuse +
> - (bitmap_pages + 1) * PAGE_SIZE / sizeof(long);
> -
> - spin_lock_init(&dcache->lock);
> -
> - return create_perdomain_mapping(d, (unsigned long)dcache->inuse,
> - 2 * bitmap_pages + 1,
> - NIL(l1_pgentry_t *), NULL);
> -}
> -
> -int mapcache_vcpu_init(struct vcpu *v)
> -{
> - struct domain *d = v->domain;
> - struct mapcache_domain *dcache = &d->arch.pv.mapcache;
> - unsigned long i;
> - unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
> - unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
> -
> - if ( !is_pv_vcpu(v) || !dcache->inuse )
> + if ( is_idle_vcpu(v) || is_hvm_vcpu(v) )
> return 0;
>
> + BUILD_BUG_ON(MAPCACHE_VIRT_END > ARG_XLAT_VIRT_START);
> +
> if ( ents > dcache->entries )
> {
> + int rc;
> +
> + ASSERT(ents * PAGE_SIZE <= (PERDOMAIN_SLOT_MBYTES << 20));
> +
> /* Populate page tables. */
> - int rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents,
> + rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents,
> NIL(l1_pgentry_t *), NULL);
>
> - /* Populate bit maps. */
> - if ( !rc )
> - rc = create_perdomain_mapping(d, (unsigned long)dcache->inuse,
> - nr, NULL, NIL(struct page_info *));
> - if ( !rc )
> - rc = create_perdomain_mapping(d, (unsigned long)dcache->garbage,
> - nr, NULL, NIL(struct page_info *));
> -
> if ( rc )
> return rc;
>
> dcache->entries = ents;
Given that:
1. mapcache_domain is now a structure with only one member.
2. ents is a constant throughout domain's lifecycle.
You can replace mapcache_domain with a boolean --
mapcache_mapping_populated (?) in arch.pv.
If I'm not mistaken, the size of the mapping is derived from the vcpu
being initialised, so a further improvement is to lift the mapping
creation out of mapcache_vcpu_init.
> }
>
> - /* Mark all maphash entries as not in use. */
> BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES);
> + /* MAPHASH_ENTRIES has to be power-of-two to make hashing work. */
> + BUILD_BUG_ON(MAPHASH_ENTRIES & (MAPHASH_ENTRIES - 1));
> + /*
> + * Since entries in the maphash also occupy inuse slots, we have to make
> + * sure MAPCACHE_VCPU_ENTRIES is large enough to accommodate both the
> + * maphash and a nested page table walk.
> + */
> + BUILD_BUG_ON(MAPCACHE_VCPU_ENTRIES - MAPHASH_ENTRIES <
> + CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS);
> +
> + v->arch.pv.mapcache = xzalloc(struct mapcache_vcpu);
> + if ( !v->arch.pv.mapcache )
> + return -ENOMEM;
> +
> + /* Mark all maphash entries as not in use. */
> for ( i = 0; i < MAPHASH_ENTRIES; i++ )
> {
> - struct vcpu_maphash_entry *hashent = &v->arch.pv.mapcache.hash[i];
> -
> - hashent->mfn = ~0UL; /* never valid to map */
> - hashent->idx = MAPHASHENT_NOTINUSE;
> + v->arch.pv.mapcache->hash_mfn[i] = ~0UL;
mfn_x(INVALID_MFN) here.
Or you can change the type of the array to mfn_t. It won't enlarge its
size.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |