Xen project Mailing List

Re: [Xen-devel] [PATCH v2] x86/domain_page: implement pure per-vCPU mapping infrastructure

From: Wei Liu <wl@xxxxxxx>

Date: Fri, 21 Feb 2020 11:50:17 +0000

Cc: Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, jgrall@xxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 21 Feb 2020 11:50:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Feb 06, 2020 at 06:58:23PM +0000, Hongyan Xia wrote: > Rewrite the mapcache to be purely per-vCPU instead of partially per-vcpu > and partially per-domain. > > The existing mapcache implementation keeps per-vcpu hash tables for > caching, but also creates a per-domain region which is shared by all > vcpus and protected by a per-domain lock. When the vcpu count of a > domain is large, the per-domain lock contention is high. Also, several > data structures are shared among all vcpus, potentially creating serious > cache ping-pong effects. Performance degradation can be seen on domains > with >16 vcpus. > > This patch is needed to address these issues when we start relying on > the mapcache (e.g., when we do not have a direct map). It partitions the > per-domain region into per-vcpu regions so that no mapcache state is > shared among vcpus. As a result, no locking is required and a much > simpler maphash design can be used. The performance improvement after > this patch is quite noticeable. > > Benchmarks > (baseline uses direct map instead of the mapcache in map_domain_page, > old is the existing mapcache and new is after this patch): > > Geekbench on a 32-vCPU guest, > > performance drop old new > single core 0.04% 0.18% Do you know why the new mapcache has more overhead than the old one in the single core case? > multi core 2.43% 0.08% > > fio on a 32-vCPU guest, LVM atop 8*SSD, > > performance drop old new > 3.05% 0.28% > > There should be room for futher optimisations, but this already > improves over the old mapcache by a lot. > > In the new cache design, maphash entries themselves also occupy inuse > slots. The existing configuration of 16 slots for each vcpu is no longer > sufficient to include both the maphash and a nested page table walk. > Fortunately, the per-domain inuse and garbage bit vectors are removed > from the PERDOMAIN region, giving us enough room to expand the available > mapping slots. > > Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx> [...] > void unmap_domain_page(const void *ptr) > { > - unsigned int idx; > + unsigned int idx, glb_idx; > struct vcpu *v; > - struct mapcache_domain *dcache; > - unsigned long va = (unsigned long)ptr, mfn, flags; > - struct vcpu_maphash_entry *hashent; > + struct mapcache_vcpu *vcache; > + unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags; > > if ( va >= DIRECTMAP_VIRT_START ) > return; > @@ -189,114 +144,78 @@ void unmap_domain_page(const void *ptr) > v = mapcache_current_vcpu(); > ASSERT(v && is_pv_vcpu(v)); > > - dcache = &v->domain->arch.pv.mapcache; > - ASSERT(dcache->inuse); > + vcache = v->arch.pv.mapcache; > + ASSERT(vcache); > > - idx = PFN_DOWN(va - MAPCACHE_VIRT_START); > - mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx)); > - hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)]; Please also assert the va in question is within the range allocated to this vcpu. > + glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START); > + idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES; > > local_irq_save(flags); > > - if ( hashent->idx == idx ) > - { > - ASSERT(hashent->mfn == mfn); > - ASSERT(hashent->refcnt); > - hashent->refcnt--; > - } > - else if ( !hashent->refcnt ) > - { > - if ( hashent->idx != MAPHASHENT_NOTINUSE ) > - { > - /* /First/, zap the PTE. */ > - ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(hashent->idx)) == > - hashent->mfn); > - l1e_write(&MAPCACHE_L1ENT(hashent->idx), l1e_empty()); > - /* /Second/, mark as garbage. */ > - set_bit(hashent->idx, dcache->garbage); > - } > - > - /* Add newly-freed mapping to the maphash. */ > - hashent->mfn = mfn; > - hashent->idx = idx; > - } > - else > - { > - /* /First/, zap the PTE. */ > - l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty()); > - /* /Second/, mark as garbage. */ > - set_bit(idx, dcache->garbage); > - } > + mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx)); > + hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)]; > + > + vcache->refcnt[idx]--; > + /* If refcnt drops to 0, we only clear inuse when it's not in the > maphash. */ It would be clearer to "when it has been evicted from maphash" to match the terminology in the map routine. > + if ( hashmfn != mfn && !vcache->refcnt[idx] ) > + __clear_bit(idx, vcache->inuse); Also, please flush the linear address here and the other __clear_bit location. > > local_irq_restore(flags); > } > > -int mapcache_domain_init(struct domain *d) > +int mapcache_vcpu_init(struct vcpu *v) > { > + struct domain *d = v->domain; > struct mapcache_domain *dcache = &d->arch.pv.mapcache; > - unsigned int bitmap_pages; > - > - ASSERT(is_pv_domain(d)); > + unsigned long i; Since you're changing this anyway -- we normally use unsigned int as index type. > + unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES; > > #ifdef NDEBUG > if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) > ) > return 0; > #endif > > - BUILD_BUG_ON(MAPCACHE_VIRT_END + PAGE_SIZE * (3 + > - 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) > > > - MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20)); > - bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long)); > - dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE; > - dcache->garbage = dcache->inuse + > - (bitmap_pages + 1) * PAGE_SIZE / sizeof(long); > - > - spin_lock_init(&dcache->lock); > - > - return create_perdomain_mapping(d, (unsigned long)dcache->inuse, > - 2 * bitmap_pages + 1, > - NIL(l1_pgentry_t *), NULL); > -} > - > -int mapcache_vcpu_init(struct vcpu *v) > -{ > - struct domain *d = v->domain; > - struct mapcache_domain *dcache = &d->arch.pv.mapcache; > - unsigned long i; > - unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES; > - unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long)); > - > - if ( !is_pv_vcpu(v) || !dcache->inuse ) > + if ( is_idle_vcpu(v) || is_hvm_vcpu(v) ) > return 0; > > + BUILD_BUG_ON(MAPCACHE_VIRT_END > ARG_XLAT_VIRT_START); > + > if ( ents > dcache->entries ) > { > + int rc; > + > + ASSERT(ents * PAGE_SIZE <= (PERDOMAIN_SLOT_MBYTES << 20)); > + > /* Populate page tables. */ > - int rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents, > + rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents, > NIL(l1_pgentry_t *), NULL); > > - /* Populate bit maps. */ > - if ( !rc ) > - rc = create_perdomain_mapping(d, (unsigned long)dcache->inuse, > - nr, NULL, NIL(struct page_info *)); > - if ( !rc ) > - rc = create_perdomain_mapping(d, (unsigned long)dcache->garbage, > - nr, NULL, NIL(struct page_info *)); > - > if ( rc ) > return rc; > > dcache->entries = ents; Given that: 1. mapcache_domain is now a structure with only one member. 2. ents is a constant throughout domain's lifecycle. You can replace mapcache_domain with a boolean -- mapcache_mapping_populated (?) in arch.pv. If I'm not mistaken, the size of the mapping is derived from the vcpu being initialised, so a further improvement is to lift the mapping creation out of mapcache_vcpu_init. > } > > - /* Mark all maphash entries as not in use. */ > BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES); > + /* MAPHASH_ENTRIES has to be power-of-two to make hashing work. */ > + BUILD_BUG_ON(MAPHASH_ENTRIES & (MAPHASH_ENTRIES - 1)); > + /* > + * Since entries in the maphash also occupy inuse slots, we have to make > + * sure MAPCACHE_VCPU_ENTRIES is large enough to accommodate both the > + * maphash and a nested page table walk. > + */ > + BUILD_BUG_ON(MAPCACHE_VCPU_ENTRIES - MAPHASH_ENTRIES < > + CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS); > + > + v->arch.pv.mapcache = xzalloc(struct mapcache_vcpu); > + if ( !v->arch.pv.mapcache ) > + return -ENOMEM; > + > + /* Mark all maphash entries as not in use. */ > for ( i = 0; i < MAPHASH_ENTRIES; i++ ) > { > - struct vcpu_maphash_entry *hashent = &v->arch.pv.mapcache.hash[i]; > - > - hashent->mfn = ~0UL; /* never valid to map */ > - hashent->idx = MAPHASHENT_NOTINUSE; > + v->arch.pv.mapcache->hash_mfn[i] = ~0UL; mfn_x(INVALID_MFN) here. Or you can change the type of the array to mfn_t. It won't enlarge its size. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.