[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv7 3/3] gnttab: use per-VCPU maptrack free lists



At 14:28 +0100 on 30 Apr (1430404125), David Vrabel wrote:
> From: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> 
> Performance analysis of aggregate network throughput with many VMs
> shows that performance is signficantly limited by contention on the
> maptrack lock when obtaining/releasing maptrack handles from the free
> list.
> 
> Instead of a single free list use a per-VCPU list. This avoids any
> contention when obtaining a handle.  Handles must be released back to
> their original list and since this may occur on a different VCPU there
> is some contention on the destination VCPU's free list tail pointer
> (but this is much better than a per-domain lock).
> 
> A domain's maptrack limit is multiplied by the number of VCPUs.  This
> ensures that a worst case domain that only performs grant table
> operations via one VCPU will not see a lower map track limit.
> 
> Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> ---
>  xen/common/grant_table.c      |  131 
> ++++++++++++++++++++++++-----------------
>  xen/include/xen/grant_table.h |    8 ++-
>  xen/include/xen/sched.h       |    5 ++
>  3 files changed, 87 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 569b3d3..d8e7373 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -118,9 +118,9 @@ struct gnttab_unmap_common {
>      ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
>  
>  static inline unsigned int
> -nr_maptrack_frames(struct grant_table *t)
> +nr_vcpu_maptrack_frames(struct vcpu *v)
>  {
> -    return t->maptrack_limit / MAPTRACK_PER_PAGE;
> +    return v->maptrack_limit / MAPTRACK_PER_PAGE;
>  }
>  
>  #define MAPTRACK_TAIL (~0u)
> @@ -255,66 +255,91 @@ static int __get_paged_frame(unsigned long gfn, 
> unsigned long *frame, struct pag
>  
>  static inline int
>  __get_maptrack_handle(
> -    struct grant_table *t)
> +    struct grant_table *t,
> +    struct vcpu *v)
>  {
> -    unsigned int h;
> -    if ( unlikely((h = t->maptrack_head) == MAPTRACK_TAIL) )
> +    unsigned int head, next;
> +
> +    head = v->maptrack_head;
> +    if ( unlikely(head == MAPTRACK_TAIL) )
> +        return -1;
> +
> +    next = maptrack_entry(t, head).ref;
> +    if ( unlikely(next == MAPTRACK_TAIL) )
>          return -1;

So this is an extra restriction over the old code: the list shall
never be empty.  Is that to make freeing onto the tail of a remote
freelist easier?  If so can you please add a comment so that nobody
drops this later?

> -    t->maptrack_head = maptrack_entry(t, h).ref;
> -    return h;
> +
> +    v->maptrack_head = next;
> +
> +    return head;
>  }
>  
>  static inline void
>  put_maptrack_handle(
>      struct grant_table *t, int handle)
>  {
> -    spin_lock(&t->maptrack_lock);
> -    maptrack_entry(t, handle).ref = t->maptrack_head;
> -    t->maptrack_head = handle;
> -    spin_unlock(&t->maptrack_lock);
> +    struct domain *d = current->domain;
> +    struct vcpu *v;
> +    u32 prev_tail, cur_tail;
> +
> +    /* Set current hande to have TAIL reference */
> +    maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
> +    v = d->vcpu[maptrack_entry(t,handle).vcpu];
> +    cur_tail = v->maptrack_tail;
> +    do {
> +        prev_tail = cur_tail;
> +        cur_tail = cmpxchg((u32 *)&v->maptrack_tail, prev_tail, handle);
> +    } while ( cur_tail != prev_tail );

Thinking out loud here: this is the matching code that needs a tail
entry to append to.  Because we advance the tail pointer befiore
threading the linst in the .ref field, there's a potentially large set
of half-freed entries here that are/were the tail pointer but haven't
had their .ref fields updated yet.  Eventually all freeing threads
will execute the line below and stitch things together.  And cmpxchg()
is a full barrier so we're safe against reordering here: any fragment
of list will end in MAPTRACK_TAIL.  Good.

But this:

> +    maptrack_entry(t, prev_tail).ref = handle;

ought to be a write_atomic(), and the reads in
__get_maptrack_handle() should be atomic too.  

>  }
>  
>  static inline int
>  get_maptrack_handle(
>      struct grant_table *lgt)
>  {
> +    struct vcpu          *v = current;
>      int                   i;
>      grant_handle_t        handle;
>      struct grant_mapping *new_mt;
> -    unsigned int          new_mt_limit, nr_frames;
> -
> -    spin_lock(&lgt->maptrack_lock);
> +    unsigned int          nr_frames;
>  
> -    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
> -    {
> -        nr_frames = nr_maptrack_frames(lgt);
> -        if ( nr_frames >= max_maptrack_frames )
> -            break;
> -
> -        new_mt = alloc_xenheap_page();
> -        if ( !new_mt )
> -            break;
> +    handle = __get_maptrack_handle(lgt, v);
> +    if ( likely(handle != -1) )
> +        return handle;
>  
> -        clear_page(new_mt);
> +    nr_frames = nr_vcpu_maptrack_frames(v);
> +    if ( nr_frames >= max_maptrack_frames )
> +        return -1;
>  
> -        new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
> +    new_mt = alloc_xenheap_page();
> +    if ( !new_mt )
> +        return -1;
> +    clear_page(new_mt);
>  
> -        for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
> -            new_mt[i - 1].ref = lgt->maptrack_limit + i;
> -        new_mt[i - 1].ref = lgt->maptrack_head;
> -        lgt->maptrack_head = lgt->maptrack_limit;
> +    spin_lock(&lgt->maptrack_lock);
>  
> -        lgt->maptrack[nr_frames] = new_mt;
> -        smp_wmb();
> -        lgt->maptrack_limit      = new_mt_limit;
> +    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ ){

Style nit: brace on its own line, please. 

> +        new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
> +        new_mt[i - 1].vcpu = v->vcpu_id;
> +    }
> +    /* Set last entry vcpu and ref */
> +    new_mt[i - 1].ref = v->maptrack_head;
> +    new_mt[i - 1].vcpu = v->vcpu_id;
>  
> -        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
> -                 nr_frames + 1);
> +    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
> +    if (v->maptrack_tail == MAPTRACK_TAIL)
> +    {
> +        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
> +            + MAPTRACK_PER_PAGE - 1;
> +        new_mt[i - 1].ref = MAPTRACK_TAIL;

This clause was slightly confusing to me until I realised that it's
only for the case where no maptrack entries have been allocated to
this vcpu at all before (because you keep at least one on the
freelist).  And therefore, there's no need to be careful about this
update to maptrack_tail.  I think that deserves a comment.

So I think my only concerns about this whole series are some comments
and using atomic read & writes in the linked-list code.  I'm happy to
believe that you'll DTRT with all that, so for the next spin,

Acked-by: Tim Deegan <tim@xxxxxxx> 

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.