[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/domain_page: implement pure per-vCPU mapping infrastructure



On Thu, Feb 06, 2020 at 06:58:23PM +0000, Hongyan Xia wrote:
> Rewrite the mapcache to be purely per-vCPU instead of partially per-vcpu
> and partially per-domain.
> 
> The existing mapcache implementation keeps per-vcpu hash tables for
> caching, but also creates a per-domain region which is shared by all
> vcpus and protected by a per-domain lock. When the vcpu count of a
> domain is large, the per-domain lock contention is high. Also, several
> data structures are shared among all vcpus, potentially creating serious
> cache ping-pong effects. Performance degradation can be seen on domains
> with >16 vcpus.
> 
> This patch is needed to address these issues when we start relying on
> the mapcache (e.g., when we do not have a direct map). It partitions the
> per-domain region into per-vcpu regions so that no mapcache state is
> shared among vcpus. As a result, no locking is required and a much
> simpler maphash design can be used. The performance improvement after
> this patch is quite noticeable.
> 
> Benchmarks
> (baseline uses direct map instead of the mapcache in map_domain_page,
> old is the existing mapcache and new is after this patch):
> 
> Geekbench on a 32-vCPU guest,
> 
> performance drop     old        new
> single core         0.04%      0.18%

Do you know why the new mapcache has more overhead than the old one in
the single core case?

> multi core          2.43%      0.08%
> 
> fio on a 32-vCPU guest, LVM atop 8*SSD,
> 
> performance drop     old        new
>                     3.05%      0.28%
> 
> There should be room for futher optimisations, but this already
> improves over the old mapcache by a lot.
> 
> In the new cache design, maphash entries themselves also occupy inuse
> slots. The existing configuration of 16 slots for each vcpu is no longer
> sufficient to include both the maphash and a nested page table walk.
> Fortunately, the per-domain inuse and garbage bit vectors are removed
> from the PERDOMAIN region, giving us enough room to expand the available
> mapping slots.
> 
> Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx>

[...]

>  void unmap_domain_page(const void *ptr)
>  {
> -    unsigned int idx;
> +    unsigned int idx, glb_idx;
>      struct vcpu *v;
> -    struct mapcache_domain *dcache;
> -    unsigned long va = (unsigned long)ptr, mfn, flags;
> -    struct vcpu_maphash_entry *hashent;
> +    struct mapcache_vcpu *vcache;
> +    unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags;
>  
>      if ( va >= DIRECTMAP_VIRT_START )
>          return;
> @@ -189,114 +144,78 @@ void unmap_domain_page(const void *ptr)
>      v = mapcache_current_vcpu();
>      ASSERT(v && is_pv_vcpu(v));
>  
> -    dcache = &v->domain->arch.pv.mapcache;
> -    ASSERT(dcache->inuse);
> +    vcache = v->arch.pv.mapcache;
> +    ASSERT(vcache);
>  
> -    idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
> -    mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx));
> -    hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)];

Please also assert the va in question is within the range allocated to
this vcpu.

> +    glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
> +    idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES;
>  
>      local_irq_save(flags);
>  
> -    if ( hashent->idx == idx )
> -    {
> -        ASSERT(hashent->mfn == mfn);
> -        ASSERT(hashent->refcnt);
> -        hashent->refcnt--;
> -    }
> -    else if ( !hashent->refcnt )
> -    {
> -        if ( hashent->idx != MAPHASHENT_NOTINUSE )
> -        {
> -            /* /First/, zap the PTE. */
> -            ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(hashent->idx)) ==
> -                   hashent->mfn);
> -            l1e_write(&MAPCACHE_L1ENT(hashent->idx), l1e_empty());
> -            /* /Second/, mark as garbage. */
> -            set_bit(hashent->idx, dcache->garbage);
> -        }
> -
> -        /* Add newly-freed mapping to the maphash. */
> -        hashent->mfn = mfn;
> -        hashent->idx = idx;
> -    }
> -    else
> -    {
> -        /* /First/, zap the PTE. */
> -        l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty());
> -        /* /Second/, mark as garbage. */
> -        set_bit(idx, dcache->garbage);
> -    }
> +    mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx));
> +    hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)];
> +
> +    vcache->refcnt[idx]--;
> +    /* If refcnt drops to 0, we only clear inuse when it's not in the 
> maphash. */

It would be clearer to "when it has been evicted from maphash" to match
the terminology in the map routine.

> +    if ( hashmfn != mfn && !vcache->refcnt[idx] )
> +        __clear_bit(idx, vcache->inuse);

Also, please flush the linear address here and the other __clear_bit
location.


>  
>      local_irq_restore(flags);
>  }
>  
> -int mapcache_domain_init(struct domain *d)
> +int mapcache_vcpu_init(struct vcpu *v)
>  {
> +    struct domain *d = v->domain;
>      struct mapcache_domain *dcache = &d->arch.pv.mapcache;
> -    unsigned int bitmap_pages;
> -
> -    ASSERT(is_pv_domain(d));
> +    unsigned long i;

Since you're changing this anyway -- we normally use unsigned int as
index type.

> +    unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
>  
>  #ifdef NDEBUG
>      if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) 
> )
>          return 0;
>  #endif
>  
> -    BUILD_BUG_ON(MAPCACHE_VIRT_END + PAGE_SIZE * (3 +
> -                 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) 
> >
> -                 MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20));
> -    bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long));
> -    dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE;
> -    dcache->garbage = dcache->inuse +
> -                      (bitmap_pages + 1) * PAGE_SIZE / sizeof(long);
> -
> -    spin_lock_init(&dcache->lock);
> -
> -    return create_perdomain_mapping(d, (unsigned long)dcache->inuse,
> -                                    2 * bitmap_pages + 1,
> -                                    NIL(l1_pgentry_t *), NULL);
> -}
> -
> -int mapcache_vcpu_init(struct vcpu *v)
> -{
> -    struct domain *d = v->domain;
> -    struct mapcache_domain *dcache = &d->arch.pv.mapcache;
> -    unsigned long i;
> -    unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
> -    unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
> -
> -    if ( !is_pv_vcpu(v) || !dcache->inuse )
> +    if ( is_idle_vcpu(v) || is_hvm_vcpu(v) )
>          return 0;
>  
> +    BUILD_BUG_ON(MAPCACHE_VIRT_END > ARG_XLAT_VIRT_START);
> +
>      if ( ents > dcache->entries )
>      {
> +        int rc;
> +
> +        ASSERT(ents * PAGE_SIZE <= (PERDOMAIN_SLOT_MBYTES << 20));
> +
>          /* Populate page tables. */
> -        int rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents,
> +        rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START, ents,
>                                            NIL(l1_pgentry_t *), NULL);
>  
> -        /* Populate bit maps. */
> -        if ( !rc )
> -            rc = create_perdomain_mapping(d, (unsigned long)dcache->inuse,
> -                                          nr, NULL, NIL(struct page_info *));
> -        if ( !rc )
> -            rc = create_perdomain_mapping(d, (unsigned long)dcache->garbage,
> -                                          nr, NULL, NIL(struct page_info *));
> -
>          if ( rc )
>              return rc;
>  
>          dcache->entries = ents;

Given that:

1. mapcache_domain is now a structure with only one member.
2. ents is a constant throughout domain's lifecycle.

You can replace mapcache_domain with a boolean --
mapcache_mapping_populated (?) in arch.pv.

If I'm not mistaken, the size of the mapping is derived from the vcpu
being initialised, so a further improvement is to lift the mapping
creation out of mapcache_vcpu_init.

>      }
>  
> -    /* Mark all maphash entries as not in use. */
>      BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES);
> +    /* MAPHASH_ENTRIES has to be power-of-two to make hashing work. */
> +    BUILD_BUG_ON(MAPHASH_ENTRIES & (MAPHASH_ENTRIES - 1));
> +    /*
> +     * Since entries in the maphash also occupy inuse slots, we have to make
> +     * sure MAPCACHE_VCPU_ENTRIES is large enough to accommodate both the
> +     * maphash and a nested page table walk.
> +     */
> +    BUILD_BUG_ON(MAPCACHE_VCPU_ENTRIES - MAPHASH_ENTRIES <
> +                 CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS);
> +
> +    v->arch.pv.mapcache = xzalloc(struct mapcache_vcpu);
> +    if ( !v->arch.pv.mapcache )
> +        return -ENOMEM;
> +
> +    /* Mark all maphash entries as not in use. */
>      for ( i = 0; i < MAPHASH_ENTRIES; i++ )
>      {
> -        struct vcpu_maphash_entry *hashent = &v->arch.pv.mapcache.hash[i];
> -
> -        hashent->mfn = ~0UL; /* never valid to map */
> -        hashent->idx = MAPHASHENT_NOTINUSE;
> +        v->arch.pv.mapcache->hash_mfn[i] = ~0UL;

mfn_x(INVALID_MFN) here.

Or you can change the type of the array to mfn_t. It won't enlarge its
size.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.