[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists



>>> On 02.06.15 at 18:26, <david.vrabel@xxxxxxxxxx> wrote:
> Performance analysis of aggregate network throughput with many VMs
> shows that performance is signficantly limited by contention on the
> maptrack lock when obtaining/releasing maptrack handles from the free
> list.
> 
> Instead of a single free list use a per-VCPU list. This avoids any
> contention when obtaining a handle.  Handles must be released back to
> their original list and since this may occur on a different VCPU there
> is some contention on the destination VCPU's free list tail pointer
> (but this is much better than a per-domain lock).
> 
> Increase the default maximum number of maptrack frames by 4 times
> because: a) struct grant_mapping is now 16 bytes (instead of 8); and
> b) a guest may not evenly distribute all the grant map operations
> across the VCPUs (meaning some VCPUs need more maptrack entries than
> others).
> 
> Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> Acked-by: Tim Deegan <tim@xxxxxxx>

What version was that ack given for?

> ---
> v11:
> - Allocate maptrack array with vzalloc() since it may be >1 page.
> 
> v10:
> - Divide max_maptrack_frames evenly amongst the VCPUs.
> - Increase default max_maptrack_frames to compensate.

Iirc before both of these changes, and the v10 ones imo should
have invalidated it. Tim, I'm particularly trying to understand
whether you're okay with the original's (potentially even heavier)
resource use and/or this version's (risking to run out of maptrack
entries _much_ earlier than currently).

>  static inline void
>  put_maptrack_handle(
>      struct grant_table *t, int handle)
>  {
> -    spin_lock(&t->maptrack_lock);
> -    maptrack_entry(t, handle).ref = t->maptrack_head;
> -    t->maptrack_head = handle;
> -    spin_unlock(&t->maptrack_lock);

I think the dropping of the lock here makes it necessary to add a
write barrier between __gnttab_unmap_common_complete()
zeroing op->map->flags and the call of this function.

>  static inline int
>  get_maptrack_handle(
>      struct grant_table *lgt)
>  {
> +    struct vcpu          *v = current;
>      int                   i;
>      grant_handle_t        handle;
>      struct grant_mapping *new_mt;
> -    unsigned int          new_mt_limit, nr_frames;
>  
> -    spin_lock(&lgt->maptrack_lock);
> +    handle = __get_maptrack_handle(lgt, v);
> +    if ( likely(handle != -1) )
> +        return handle;
>  
> -    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
> -    {
> -        nr_frames = nr_maptrack_frames(lgt);
> -        if ( nr_frames >= max_maptrack_frames )
> -            break;
> +    /*
> +     * max_maptrack_frames is per domain so each VCPU gets a share of
> +     * the maximum, but allow at least one frame per VCPU.
> +     */
> +    if ( v->maptrack_frames
> +         && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus 
> )
> +        return -1;

So with e.g. max_maptrack_frames being 256 and ->max_vcpus
being 129 you'd potentially allow each vCPU to only have exactly
one page despite there being 127 more to use.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.