[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control



> -----Original Message-----
> From: annie li [mailto:annie.li@xxxxxxxxxx]
> Sent: 01 December 2013 04:18
> To: Paul Durrant
> Cc: Wei Liu; xen-devel@xxxxxxxxxxxxx; David Vrabel; Ian Campbell
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-
> receive-side flow control
> 
> 
> On 2013-11-29 18:03, Paul Durrant wrote:
> >> Can you describe it in more details? What kind of scenarios does this
> >> issue happen? I assume current xenvif_count_skb_slots() returns
> accurate
> >> value now.
> >>
> > Ok, I'll elaborate a bit for the record...
> >
> > The way that flow control works without this patch is that, in start_xmit 
> > the
> code uses xenvif_count_skb_slots() to predict how many slots
> xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek'
> counter which it then uses to determine if that shared ring has that amount
> of space available by checking whether 'req_prod' has passed that value. If
> the ring doesn't have space the tx queue is stopped. xenvif_gop_skb() will
> then consume slots and update 'req_cons' and issue responses, updating
> 'rsp_prod' as it goes. The frontend will consume those responses and post
> new requests, by updating req_prod. So, req_prod chases req_cons which
> chases rsp_prod, and can never exceed that value. Thus if
> xenvif_count_skb_slots() ever returns a number of slots greater than
> xenvif_gop_skb() uses req_cons_peek will get to a value that req_prod
> cannot achieve (since it's limited by the 'real' req_cons) and, if this 
> happens
> enough times, req_cons_peek gets more than a ring size ahead!
> >    of req_cons and the tx queue then remains stopped forever waiting for
> an unachievable amount of space to become available in the ring. It was fairly
> trivial to check that this was happening by adding a BUG_ON if the value
> calculated by xenvif_count_skb_slots() was ever different to that returned
> by xenvif_gob_skb(). Starting a simple SMB transfer between a couple of
> windows VMs was enough to trigger it.
> 
> So xenvif_count_skb_slots() not returning accurate vaule causes this issue.
> 

Yes, that's right.

> >
> > Having two routines trying to calculate the same value is always going to be
> fragile, so this patch does away with that. All we essentially need to do is
> make sure that we have 'enough stuff' on our internal queue without letting
> it build up uncontrollably. So start_xmit makes a cheap optimistic check of
> how much space is needed for an skb and only turns the queue off if that is
> unachievable. net_rx_action() is the place where we could do with an
> accurate predicition but, since that has proven tricky to calculate, a cheap
> worse-case (but not too bad) estimate is all we really need since the only
> thing we *must* prevent is xenvif_gop_skb() consuming more slots than are
> available.
> >
> > I also added some hysteresis as the code was pretty dumb in that respect
> and would wake the tx queue as soon as enough space for a single skb
> became available, basically leading to a lot of thrashing between the queue
> being stopped or active.
> >
> > Does that explain?
> >
> This makes sense, thanks Paul.
> 

Good :-)

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.