|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control
On Thu, Nov 28, 2013 at 01:11:12PM +0000, Paul Durrant wrote:
> The flow control code relies on a double pass of tke skb, firstly to count
> the number of ring slots needed to supply sufficient grant references, and
> another to actually consume those references and build the grant copy
> operations. It transpires that, in some scenarios, the initial count and the
> number of references consumed differs and, after this happens a number of
> times, flow control is completely broken.
Please add blank lines between paragraphs in later versions.
> This patch simplifies things by making a very quick static analysis of the
> minimal number or ring slots needed for an skb and then only stopping the
> queue if that minimal number of slots is not available. The worker thread
> does a worst-case analysis and only processes what will definitely fit,
> leaving remaining skbs on the internal queue. This patch also gets rid of the
> drop that occurs if an skb won't fit (because that doesn't matter - we always
> internally queue *at least* what we can put in the shared ring) and adds
> some hysteresis to waking the external queue. (It only gets woken when at
> least half the shared ring is available).
I think this is the right direction to go. However we've tripped over
this several times so is it possible to test it with XenRT?
> Without this patch I can trivially stall netback permanently by just doing
> a large guest to guest file copy between two Windows Server 2008R2 VMs on a
> single host.
> Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> ---
> drivers/net/xen-netback/common.h | 26 ++---
> drivers/net/xen-netback/interface.c | 45 ++++----
> drivers/net/xen-netback/netback.c | 202
> ++++++++++-------------------------
> 3 files changed, 87 insertions(+), 186 deletions(-)
>
> diff --git a/drivers/net/xen-netback/common.h
> b/drivers/net/xen-netback/common.h
> index 08ae01b..d7888db 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -136,12 +136,7 @@ struct xenvif {
> char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
> struct xen_netif_rx_back_ring rx;
> struct sk_buff_head rx_queue;
> -
> - /* Allow xenvif_start_xmit() to peek ahead in the rx request
> - * ring. This is a prediction of what rx_req_cons will be
> - * once all queued skbs are put on the ring.
> - */
> - RING_IDX rx_req_cons_peek;
> + bool rx_event;
>
What is this for? Reading through the code my impression is that this is
a flag indicating RX interrupt is triggered. Why is it useful? Why is
skb_queue_empty not sufficient?
> /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
> * head/fragment page uses 2 copy operations because it
> @@ -198,8 +193,6 @@ void xenvif_xenbus_fini(void);
>
[...]
>
> -void xenvif_notify_tx_completion(struct xenvif *vif)
> -{
> - if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
> - netif_wake_queue(vif->dev);
> -}
> -
> static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
> {
> struct xenvif *vif = netdev_priv(dev);
> @@ -378,6 +378,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long
> tx_ring_ref,
> if (err < 0)
> goto err;
>
> + init_waitqueue_head(&vif->wq);
> +
I guess the reason behind this code movement is that, if we don't do this
netback will crash if we get interrupt before waitqueue is set up?
> if (tx_evtchn == rx_evtchn) {
> /* feature-split-event-channels == 0 */
> err = bind_interdomain_evtchn_to_irqhandler(
[...]
> @@ -583,23 +477,36 @@ void xenvif_rx_action(struct xenvif *vif)
>
> skb_queue_head_init(&rxq);
>
> - count = 0;
> -
> while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> - vif = netdev_priv(skb->dev);
> - nr_frags = skb_shinfo(skb)->nr_frags;
> + int max_slots_needed;
> + int i;
>
> - sco = (struct skb_cb_overlay *)skb->cb;
> - sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> + /* We need a cheap worse case estimate for the number of
> + * slots we'll use.
> + */
>
> - count += nr_frags + 1;
> + max_slots_needed = 2;
I'm afraid this starting point is not correct. Consider you have a SKB
with very large linear buffer, you might need more than 2 slots to fit
that in, right?
> + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> + struct page *page;
>
> - __skb_queue_tail(&rxq, skb);
> + page = skb_frag_page(&skb_shinfo(skb)->frags[i]);
> + max_slots_needed += 1 << compound_order(page);
This estimation might reserve slots more than necessary.
The removal of xenvif_count_skb_slots() is OK because you're now
estimating minimum slots, but the actual number of slots consumed needs
to be acurate. Any reason you don't want to use xenvif_gop_skb()?
> + }
> + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 ||
> + skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> + max_slots_needed++;
>
> - /* Filled the batch queue? */
> - /* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
> - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> + /* If the skb may not fit then bail out now */
> + if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
> + skb_queue_head(&vif->rx_queue, skb);
> break;
> + }
> +
> + sco = (struct skb_cb_overlay *)skb->cb;
> + sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> + BUG_ON(sco->meta_slots_used > max_slots_needed);
> +
> + __skb_queue_tail(&rxq, skb);
> }
>
[...]
> +
> int xenvif_kthread(void *data)
> {
> struct xenvif *vif = data;
> + const int threshold = XEN_NETIF_RX_RING_SIZE / 2;
>
> while (!kthread_should_stop()) {
> +
Stray blank line.
Wei.
> wait_event_interruptible(vif->wq,
> rx_work_todo(vif) ||
> kthread_should_stop());
> if (kthread_should_stop())
> break;
>
> - if (rx_work_todo(vif))
> + if (!skb_queue_empty(&vif->rx_queue))
> xenvif_rx_action(vif);
>
> + vif->rx_event = false;
> + if (xenvif_rx_ring_slots_available(vif, threshold))
> + xenvif_start_queue(vif);
> +
> cond_resched();
> }
>
> --
> 1.7.10.4
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |