[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.



On Mon, Sep 29, 2014 at 08:21:53AM +0100, Jan Beulich wrote:
> >>> On 27.09.14 at 03:32, <konrad.wilk@xxxxxxxxxx> wrote:
> > On Thu, Sep 25, 2014 at 04:04:46PM +0100, Jan Beulich wrote:
> >> >>> On 25.09.14 at 16:48, <konrad.wilk@xxxxxxxxxx> wrote:
> >> > On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
> >> >> >>> On 23.09.14 at 04:10, <konrad.wilk@xxxxxxxxxx> wrote:
> >> >> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
> >> >> >          return -ENOMEM;
> >> >> >      }
> >> >> >      pirq_dpci = pirq_dpci(info);
> >> >> > +    pt_pirq_reset(d, pirq_dpci);
> >> >> >  
> >> >> >      switch ( pt_irq_bind->irq_type )
> >> >> >      {
> >> >> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
> >> >> >          {
> >> >> >              unsigned int share;
> >> >> >  
> >> >> > -            pirq_dpci->dom = d;
> >> >> >              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
> >> >> >              {
> >> >> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> >> >> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
> >> >> >              {
> >> >> >                  if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >> >                      kill_timer(&pirq_dpci->timer);
> >> >> > -                pirq_dpci->dom = NULL;
> >> >> >                  list_del(&girq->list);
> >> >> >                  list_del(&digl->list);
> >> >> >                  hvm_irq_dpci->link_cnt[link]--;
> >> >> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
> >> >> >          msixtbl_pt_unregister(d, pirq);
> >> >> >          if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >> >              kill_timer(&pirq_dpci->timer);
> >> >> > -        pirq_dpci->dom   = NULL;
> >> >> >          pirq_dpci->flags = 0;
> >> >> >          pirq_cleanup_check(pirq, d);
> >> >> >      }
> >> >> 
> >> >> Is all of the above really necessary? I.e. I can neither see why setting
> >> >> ->dom earlier is needed, nor why clearing it on the error paths should
> >> >> be dropped.
> >> > 
> >> > Yes. We need the ->dom so that the hvm_dirq_assist can run without
> >> > hitting an NULL pointer exception. Please keep in mind that the moment
> >> > we setup the IRQ action handler, we are "live" - [...]
> >> 
> >> But all you need is that this happens before respective
> >> pirq_guest_bind() calls. I.e. in the PT_IRQ_TYPE_PCI and
> >> PT_IRQ_TYPE_MSI_TRANSLATE cases it was already done early enough
> >> (avoiding it remaining set on error paths), so all you'd need is adding
> >> it for the PT_IRQ_TYPE_MSI path too.
> > 
> > .. and with the below statement ["pt_pirq_reset() is just replacing
> > that assigment"] I believe having just
> >  pirq->dom = d;
> > 
> > before the switch would be correct.
> > 
> >> 
> >> I agree that the clearing of the field in error paths might need a
> >> little care, but otoh you could equally well have hvm_dirq_assist()
> >> bail when it finds it to be NULL?
> > 
> > Can't do it as is. I added in the 'get_knowalive_domain()' and 
> > 'put_domain()'
> > in the 'schedule_softirq_for' and 'dpci_softirq' respectively.
> > 
> > Which means that we would have an outstanding refcount as 'dpci_softirq'
> > could not access the '->dom' to get the domain and do proper refcounting.
> > 
> > Unless I rip out the refcounting there which I believe is OK.
> 
> No - instead the site clearing the field should then drop the
> reference (if so needed as indicated by other state). I.e.
> either there is a softirq being processed (in which case it's
> there where the reference gets dropped) or you can stop it
> from getting processed subsequently and drop the reference
> right away. Takes maybe another cmpxchg() afaict (on the
> ->dom field) or in the worst case another flag to signal this.

I've been looking at this and I've come up with something that does
address this - but instead of using 'cmpxchg' it ends up using
set_bit/clear_bit with two different states. Using 'cmpxchg' was
getting a bit too complex - or more likely I was getting too
fatigued of the code.

Anyhow here is an RFC on top of this patchset that I believe does
in spirit what you described. Naturally the whole big piece
of code that does the ref_counting on failure paths needs to be done
in an function. Please advise if this is acceptable for you
and if so I can properly flesh it out.

commit e0647b03be8319907d4f2dc424c1fcaa4895b31a
Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date:   Sun Oct 5 09:44:14 2014 -0400

    Two bits using clear/set

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 47c8ed5..cae61f1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -29,6 +29,10 @@
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
+enum {
+    STATE_SCHED, /* Bit 0 */
+    STATE_RUN,
+};
 /*
  * Should only be called from hvm_do_IRQ_dpci. We use the
  * 'masked' as an gate to thwart multiple interrupts.
@@ -37,14 +41,14 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
  * completed executing 'hvm_dirq_assist'.
  *
  */
-static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
+static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
     unsigned long flags;
 
-    if ( pirq_dpci->masked )
+    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->masked) )
         return;
 
-    pirq_dpci->masked = 1;
+    get_knownalive_domain(pirq_dpci->dom);
 
     local_irq_save(flags);
     list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
@@ -55,9 +59,9 @@ static void schedule_softirq_for(struct hvm_pirq_dpci 
*pirq_dpci)
 
 /*
  * If we are racing with softirq_dpci (masked is still set) we return
- * -EAGAIN. Otherwise we return 0.
+ * -ERESTART. Otherwise we return 0.
  *
- *  If it is -EAGAIN, it is the callers responsibility to make sure
+ *  If it is -ERESTART, it is the callers responsibility to make sure
  *  that the softirq (with the event_lock dropped) has ran. We need
  *  to flush out the outstanding 'dpci_softirq' (no more of them
  *  will be added for this pirq as the IRQ action handler has been
@@ -65,10 +69,13 @@ static void schedule_softirq_for(struct hvm_pirq_dpci 
*pirq_dpci)
  */
 int pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 {
-    if ( pirq_dpci->masked )
-        return -EAGAIN;
+    if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
+        return -ERESTART;
+
+    if ( test_bit(STATE_SCHED, &pirq_dpci->masked) )
+        return -ERESTART;
     /*
-     * If in the future we would call 'schedule_softirq_for' right away
+     * If in the future we would call 'raise_softirq_for' right away
      * after 'pt_pirq_softirq_active' we MUST reset the list (otherwise it
      * might have stale data).
      */
@@ -143,7 +150,6 @@ int pt_irq_create_bind(
     struct hvm_pirq_dpci *pirq_dpci;
     struct pirq *info;
     int rc, pirq = pt_irq_bind->machine_irq;
-    s_time_t start = NOW();
 
     if ( pirq < 0 || pirq >= d->nr_pirqs )
         return -EINVAL;
@@ -188,8 +194,6 @@ int pt_irq_create_bind(
     {
         spin_unlock(&d->event_lock);
         process_pending_softirqs();
-        if ( ( NOW() - start ) > SECONDS(1) )
-            return -EAGAIN;
         goto restart;
     }
     /*
@@ -230,7 +234,35 @@ int pt_irq_create_bind(
             {
                 pirq_dpci->gmsi.gflags = 0;
                 pirq_dpci->gmsi.gvec = 0;
-                pirq_dpci->dom = NULL;
+
+                /* The softirq has saved it so we are safe to reset it. */
+                if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
+                {
+                    pirq_dpci->dom = NULL;
+                }
+                else if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
+                {
+                    /* pirq_guest_unbind has made sure we won't be getting
+                     * any more interrupts (no raise_softirq_for), so the only
+                     * ones that can be are the ones that got scheduled _right_
+                     * before the pirq_guest_unbind. If we can de-schedule them
+                     * that is good. The problem #1 is that the softirq might 
be
+                     * running and it has just set STATE_RUN while we cleared
+                     * STATE_SCHED. That is OK, as right after the STATE_RUN it
+                     * will check the STATE_SCHED and if cleared it will 
unclear
+                     * STATE_RUN and ignore this pirq. We MUST put the refcount
+                     * down. */
+                    put_domain(pirq_dpci->dom);
+                    pirq_dpci->dom = NULL;
+                }
+                else
+                {
+                    /* !STATE_RUN (stale) and !STATE_SCHED.
+                     * softirq will ignore this pirq, but we MUST put the 
refcount
+                     * down. */
+                    put_domain(pirq_dpci->dom);
+                    pirq_dpci->dom = NULL;
+                }
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
@@ -332,7 +364,7 @@ int pt_irq_create_bind(
             {
                 if ( pt_irq_need_timer(pirq_dpci->flags) )
                     kill_timer(&pirq_dpci->timer);
-                pirq_dpci->dom = NULL;
+                /* TODO - function. pirq_dpci->dom = NULL; */
                 list_del(&girq->list);
                 list_del(&digl->list);
                 hvm_irq_dpci->link_cnt[link]--;
@@ -538,7 +570,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
-    schedule_softirq_for(pirq_dpci);
+    raise_softirq_for(pirq_dpci);
     return 1;
 }
 
@@ -592,11 +624,10 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
     spin_unlock(&d->event_lock);
 }
 
-static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
+static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
 {
-    struct domain *d = pirq_dpci->dom;
-
     /*
+     * TODO: Remove
      * We can be racing with 'pt_irq_destroy_bind' - with us being scheduled
      * right before 'pirq_guest_unbind' gets called - but us not yet executed.
      *
@@ -604,8 +635,6 @@ static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
      * 'mapping' - which is OK as later in this code we would
      * do nothing except clear the ->masked field anyhow.
      */
-    if ( !d )
-        return;
 
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
@@ -725,11 +754,24 @@ static void dpci_softirq(void)
 
     while ( !list_empty(&our_list) )
     {
+        struct domain *d;
+
         pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, 
softirq_list);
         list_del(&pirq_dpci->softirq_list);
 
-        hvm_dirq_assist(pirq_dpci);
-        pirq_dpci->masked = 0;
+        d = pirq_dpci->dom;
+        barrier(); /* 'd' MUST be saved before we set/clear the bits. */
+        if ( test_and_set_bit(STATE_RUN, &pirq_dpci->masked) )
+            BUG();
+        /* Asked to be descheduled. */
+        if ( !test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
+        {
+            clear_bit(STATE_RUN, &pirq_dpci->masked);
+            continue;
+        }
+        hvm_dirq_assist(d, pirq_dpci);
+        put_domain(d);
+        clear_bit(STATE_RUN, &pirq_dpci->masked);
     }
 }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index d147189..1660750 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -775,7 +775,7 @@ static int pci_clean_dpci_irqs(struct domain *d)
     struct hvm_irq_dpci *hvm_irq_dpci = NULL;
 
     if ( !iommu_enabled )
-        return -ESRCH;
+        return -ENODEV;
 
     if ( !is_hvm_domain(d) )
         return -EINVAL;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 4fc6dcf..4a9324e 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -93,7 +93,7 @@ struct hvm_irq_dpci {
 /* Machine IRQ to guest device/intx mapping. */
 struct hvm_pirq_dpci {
     uint32_t flags;
-    bool_t masked;
+    unsigned long masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
of 'latch'
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.