[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.



On Tue, Jan 13, 2015 at 10:20:00AM +0000, Jan Beulich wrote:
> >>> On 12.01.15 at 17:45, <konrad.wilk@xxxxxxxxxx> wrote:
> > There is race when we clear the STATE_SCHED in the softirq
> > - which allows the 'raise_softirq_for' to progress and
> > schedule another dpci. During that time the other CPU could
> > receive an interrupt and calls 'raise_softirq_for' and put
> > the dpci on its per-cpu list. There would be two 'dpci_softirq'
> > running at the same time (on different CPUs) where the
> > dpci state is STATE_RUN (and STATE_SCHED is cleared). This
> > ends up hitting:
> > 
> >  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> >     BUG()
> > 
> > Instead of that put the dpci back on the per-cpu list to deal
> > with later.
> > 
> > The reason we can get his with this is when an interrupt
> > affinity is set over multiple CPUs.
> > 
> > Another potential fix would be to add a guard in the raise_softirq_for
> > to check for 'STATE_RUN' bit being set and not schedule the
> > dpci until that bit has been cleared.
> 
> I indeed think this should be investigated, because it would make
> explicit what ...
> 
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -804,7 +804,17 @@ static void dpci_softirq(void)
> >          d = pirq_dpci->dom;
> >          smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> >          if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> > -            BUG();
> > +        {
> > +            unsigned long flags;
> > +
> > +            /* Put back on the list and retry. */
> > +            local_irq_save(flags);
> > +            list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> > +            local_irq_restore(flags);
> > +
> > +            raise_softirq(HVM_DPCI_SOFTIRQ);
> > +            continue;
> > +        }
> 
> ... this does implicitly - spin until the bad condition cleared.

I still haven't gotten to trying to reproduce the issues Malcolm
saw which is easier for me to do (use Intel super-fast storage
in a Windows guest) - since I've one of those boxes.

However in lieu of that, here is a patch that does pass my testing
of SR-IOV, and I believe should work fine.  I _think_ it covers
what you want it to have Jan, but of course please correct me
if I made a mistake in the logic.

From 20766b144e0b11daf3227ae978880bacd686ea2b Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Mon, 12 Jan 2015 11:35:03 -0500
Subject: [PATCH] dpci: schedule properly the dpci when it is in STATE_RUN on
 any per-cpu list.

There is race when we clear the STATE_SCHED in the softirq
- which allows the 'raise_softirq_for' to progress and
schedule another dpci. During that time the other CPU could
receive an interrupt and calls 'raise_softirq_for' and put
the dpci on its per-cpu list. There would be two 'dpci_softirq'
running at the same time (on different CPUs) where the
dpci state is STATE_RUN (and STATE_SCHED is cleared). This
ends up hitting:

 if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
        BUG()

The reason we can get his with this is when an interrupt
affinity is set over multiple CPUs.

Instead of the BUG() we can put the dpci back on the per-cpu
list to deal with later (when the softirq are activated again).
This putting the 'dpci' back on the per-cpu list is an spin
until the bad condition clear.

We also expand the test-and-set(STATE_SCHED) in raise_softirq_for
to detect for 'STATE_RUN' bit being set and schedule the dpci.
The cases in which an dpci is NOT going to be scheduled is
whenever the 'STATE_SCHED' bit is set (regardless whether
STATE_RUN is set or not).

Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
Reported-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 xen/drivers/passthrough/io.c | 43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..b61b73c 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -63,10 +63,37 @@ enum {
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
     unsigned long flags;
+    unsigned long old, new, val = pirq_dpci->state;
 
-    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
-        return;
+    /*
+     * This cmpxch is a more clear version of:
+     * if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
+     *         return;
+     * since it also checks for STATE_RUN conditions.
+     */
+    for ( ;; )
+    {
+        new = 1 << STATE_SCHED;
+        if ( val )
+            new |= val;
 
+        old = cmpxchg(&pirq_dpci->state, val, new);
+        switch ( old )
+        {
+        case (1 << STATE_SCHED):
+        case (1 << STATE_RUN) | (1 << STATE_SCHED):
+            /*
+             * Whenever STATE_SCHED is set we MUST not schedule it.
+             */
+            return;
+        }
+        /*
+         * If the 'state' is 0 or (1 << STATE_RUN) we can schedule it.
+         */
+        if ( old == val )
+            break;
+        val = old;
+    }
     get_knownalive_domain(pirq_dpci->dom);
 
     local_irq_save(flags);
@@ -804,7 +831,17 @@ static void dpci_softirq(void)
         d = pirq_dpci->dom;
         smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
         if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
-            BUG();
+        {
+            unsigned long flags;
+
+            /* Put back on the list and retry later. */
+            local_irq_save(flags);
+            list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
+            local_irq_restore(flags);
+
+            raise_softirq(HVM_DPCI_SOFTIRQ);
+            continue;
+        }
         /*
          * The one who clears STATE_SCHED MUST refcount the domain.
          */
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.