Xen project Mailing List

Re: [Xen-devel] [PATCHv7 3/3] gnttab: use per-VCPU maptrack free lists

To: David Vrabel <david.vrabel@xxxxxxxxxx>

From: Tim Deegan <tim@xxxxxxx>

Date: Thu, 30 Apr 2015 16:12:11 +0100

Cc: Keir Fraser <keir@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Christoph Egger <chegger@xxxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>

Delivery-date: Thu, 30 Apr 2015 15:12:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

At 14:28 +0100 on 30 Apr (1430404125), David Vrabel wrote: > From: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > > Performance analysis of aggregate network throughput with many VMs > shows that performance is signficantly limited by contention on the > maptrack lock when obtaining/releasing maptrack handles from the free > list. > > Instead of a single free list use a per-VCPU list. This avoids any > contention when obtaining a handle. Handles must be released back to > their original list and since this may occur on a different VCPU there > is some contention on the destination VCPU's free list tail pointer > (but this is much better than a per-domain lock). > > A domain's maptrack limit is multiplied by the number of VCPUs. This > ensures that a worst case domain that only performs grant table > operations via one VCPU will not see a lower map track limit. > > Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > --- > xen/common/grant_table.c | 131 > ++++++++++++++++++++++++----------------- > xen/include/xen/grant_table.h | 8 ++- > xen/include/xen/sched.h | 5 ++ > 3 files changed, 87 insertions(+), 57 deletions(-) > > diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c > index 569b3d3..d8e7373 100644 > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -118,9 +118,9 @@ struct gnttab_unmap_common { > ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) > > static inline unsigned int > -nr_maptrack_frames(struct grant_table *t) > +nr_vcpu_maptrack_frames(struct vcpu *v) > { > - return t->maptrack_limit / MAPTRACK_PER_PAGE; > + return v->maptrack_limit / MAPTRACK_PER_PAGE; > } > > #define MAPTRACK_TAIL (~0u) > @@ -255,66 +255,91 @@ static int __get_paged_frame(unsigned long gfn, > unsigned long *frame, struct pag > > static inline int > __get_maptrack_handle( > - struct grant_table *t) > + struct grant_table *t, > + struct vcpu *v) > { > - unsigned int h; > - if ( unlikely((h = t->maptrack_head) == MAPTRACK_TAIL) ) > + unsigned int head, next; > + > + head = v->maptrack_head; > + if ( unlikely(head == MAPTRACK_TAIL) ) > + return -1; > + > + next = maptrack_entry(t, head).ref; > + if ( unlikely(next == MAPTRACK_TAIL) ) > return -1; So this is an extra restriction over the old code: the list shall never be empty. Is that to make freeing onto the tail of a remote freelist easier? If so can you please add a comment so that nobody drops this later? > - t->maptrack_head = maptrack_entry(t, h).ref; > - return h; > + > + v->maptrack_head = next; > + > + return head; > } > > static inline void > put_maptrack_handle( > struct grant_table *t, int handle) > { > - spin_lock(&t->maptrack_lock); > - maptrack_entry(t, handle).ref = t->maptrack_head; > - t->maptrack_head = handle; > - spin_unlock(&t->maptrack_lock); > + struct domain *d = current->domain; > + struct vcpu *v; > + u32 prev_tail, cur_tail; > + > + /* Set current hande to have TAIL reference */ > + maptrack_entry(t, handle).ref = MAPTRACK_TAIL; > + v = d->vcpu[maptrack_entry(t,handle).vcpu]; > + cur_tail = v->maptrack_tail; > + do { > + prev_tail = cur_tail; > + cur_tail = cmpxchg((u32 *)&v->maptrack_tail, prev_tail, handle); > + } while ( cur_tail != prev_tail ); Thinking out loud here: this is the matching code that needs a tail entry to append to. Because we advance the tail pointer befiore threading the linst in the .ref field, there's a potentially large set of half-freed entries here that are/were the tail pointer but haven't had their .ref fields updated yet. Eventually all freeing threads will execute the line below and stitch things together. And cmpxchg() is a full barrier so we're safe against reordering here: any fragment of list will end in MAPTRACK_TAIL. Good. But this: > + maptrack_entry(t, prev_tail).ref = handle; ought to be a write_atomic(), and the reads in __get_maptrack_handle() should be atomic too. > } > > static inline int > get_maptrack_handle( > struct grant_table *lgt) > { > + struct vcpu *v = current; > int i; > grant_handle_t handle; > struct grant_mapping *new_mt; > - unsigned int new_mt_limit, nr_frames; > - > - spin_lock(&lgt->maptrack_lock); > + unsigned int nr_frames; > > - while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) ) > - { > - nr_frames = nr_maptrack_frames(lgt); > - if ( nr_frames >= max_maptrack_frames ) > - break; > - > - new_mt = alloc_xenheap_page(); > - if ( !new_mt ) > - break; > + handle = __get_maptrack_handle(lgt, v); > + if ( likely(handle != -1) ) > + return handle; > > - clear_page(new_mt); > + nr_frames = nr_vcpu_maptrack_frames(v); > + if ( nr_frames >= max_maptrack_frames ) > + return -1; > > - new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE; > + new_mt = alloc_xenheap_page(); > + if ( !new_mt ) > + return -1; > + clear_page(new_mt); > > - for ( i = 1; i < MAPTRACK_PER_PAGE; i++ ) > - new_mt[i - 1].ref = lgt->maptrack_limit + i; > - new_mt[i - 1].ref = lgt->maptrack_head; > - lgt->maptrack_head = lgt->maptrack_limit; > + spin_lock(&lgt->maptrack_lock); > > - lgt->maptrack[nr_frames] = new_mt; > - smp_wmb(); > - lgt->maptrack_limit = new_mt_limit; > + for ( i = 1; i < MAPTRACK_PER_PAGE; i++ ){ Style nit: brace on its own line, please. > + new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i; > + new_mt[i - 1].vcpu = v->vcpu_id; > + } > + /* Set last entry vcpu and ref */ > + new_mt[i - 1].ref = v->maptrack_head; > + new_mt[i - 1].vcpu = v->vcpu_id; > > - gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n", > - nr_frames + 1); > + v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE; > + if (v->maptrack_tail == MAPTRACK_TAIL) > + { > + v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) > + + MAPTRACK_PER_PAGE - 1; > + new_mt[i - 1].ref = MAPTRACK_TAIL; This clause was slightly confusing to me until I realised that it's only for the case where no maptrack entries have been allocated to this vcpu at all before (because you keep at least one on the freelist). And therefore, there's no need to be careful about this update to maptrack_tail. I think that deserves a comment. So I think my only concerns about this whole series are some comments and using atomic read & writes in the linked-list code. I'm happy to believe that you'll DTRT with all that, so for the next spin, Acked-by: Tim Deegan <tim@xxxxxxx> Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.