[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Lockup/High ksoftirqd when rate-limiting is enabled



Thanks for this quick patch.
I was able to test it today, and the high ksoftirqd cpu usage is gone.

Great!

Is there a chance this can get pushed into stable kernel versions (3.18.x, 4.4.x, etc)? There is not really a backport work, as the netback driver hasn't changed alot recently.


Tested-by: Jean-Louis Dupond <jean-louis@xxxxxxxxx>


Op 2017-06-20 13:18, schreef Wei Liu:
On Tue, Jun 20, 2017 at 11:31:02AM +0200, Jean-Louis Dupond wrote:
Hi,

As requested via IRC i'm sending this to xen-devel & netback maintainers.

We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64.
Now recently we're having issues with rate-limiting enabled.

When we enable rate limiting in Xen, and then do alot of outbound traffic on
the domU, we notice a high ksoftirqd load.
But in some cases the system locks up completely.


Can you give this patch a try?

---8<--
From a242d4a74cc4ec46c5e3d43dd07eb146be4ca233 Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Tue, 20 Jun 2017 11:49:28 +0100
Subject: [PATCH] xen-netback: correctly schedule rate-limited queues

Add a flag to indicate if a queue is rate-limited. Test the flag in
NAPI poll handler and avoid rescheduling the queue if true, otherwise
we risk locking up the host. The rescheduling shall be done when
replenishing credit.

Reported-by: Jean-Louis Dupond <jean-louis@xxxxxxxxx>
Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
---
 drivers/net/xen-netback/common.h    | 1 +
 drivers/net/xen-netback/interface.c | 6 +++++-
 drivers/net/xen-netback/netback.c   | 6 +++++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 530586be05b4..5b1d2e8402d9 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -199,6 +199,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
        unsigned long   remaining_credit;
        struct timer_list credit_timeout;
        u64 credit_window_start;
+       bool rate_limited;

        /* Statistics */
        struct xenvif_stats stats;
diff --git a/drivers/net/xen-netback/interface.c
b/drivers/net/xen-netback/interface.c
index 8397f6c92451..e322a862ddfe 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -106,7 +106,11 @@ static int xenvif_poll(struct napi_struct *napi,
int budget)

        if (work_done < budget) {
                napi_complete_done(napi, work_done);
-               xenvif_napi_schedule_or_enable_events(queue);
+               /* If the queue is rate-limited, it shall be
+                * rescheduled in the timer callback.
+                */
+               if (likely(!queue->rate_limited))
+                       xenvif_napi_schedule_or_enable_events(queue);
        }

        return work_done;
diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index 602d408fa25e..5042ff8d449a 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -180,6 +180,7 @@ static void tx_add_credit(struct xenvif_queue *queue)
                max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */

        queue->remaining_credit = min(max_credit, max_burst);
+       queue->rate_limited = false;
 }

 void xenvif_tx_credit_callback(unsigned long data)
@@ -686,8 +687,10 @@ static bool tx_credit_exceeded(struct
xenvif_queue *queue, unsigned size)
                msecs_to_jiffies(queue->credit_usec / 1000);

        /* Timer could already be pending in rare cases. */
-       if (timer_pending(&queue->credit_timeout))
+       if (timer_pending(&queue->credit_timeout)) {
+               queue->rate_limited = true;
                return true;
+       }

        /* Passed the point where we can replenish credit? */
        if (time_after_eq64(now, next_credit)) {
@@ -702,6 +705,7 @@ static bool tx_credit_exceeded(struct xenvif_queue
*queue, unsigned size)
                mod_timer(&queue->credit_timeout,
                          next_credit);
                queue->credit_window_start = next_credit;
+               queue->rate_limited = true;

                return true;
        }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.