[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.



On Fri, Nov 21, 2014 at 01:03:43PM +0100, Jan Beulich wrote:
> >>> On 21.11.14 at 12:50, <konrad.wilk@xxxxxxxxxx> wrote:
> > On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >>>>> On 20.11.14 at 20:51, <konrad.wilk@xxxxxxxxxx> wrote:
> >>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
> >>struct hvm_pirq_dpci *pirq_dpci)
> >>>      ASSERT(d->arch.hvm_domain.irq.dpci);
> >>>  
> >>>      spin_lock(&d->event_lock);
> >>> -    if ( pirq_dpci->state )
> >>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
> >>>      {
> >>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
> >>>          const struct dev_intx_gsi_link *digl;
> >>
> >>So this now guards solely against the timeout enforced EOI? Why do
> >>you no longer need to guard against cancellation (i.e. why isn't this
> >>looking at both ->state and ->masked)?
> >>
> > 
> > The previous state check was superfluous as the dpci_softirq would check 
> > for 
> > the valid STATE_ before calling hvm_dirq_assist (and deal with 
> > cancellation).
> 
> I thought so first too, but then how is the guarding against
> cancellation working now?

Since there are two forms of cancellation, lets consider each one seperatly:

1). pt_irq_create_bind and pt_irq_destroy_bind. Each of them calling 
    pt_pirq_softirq_reset in the 'error' case or in the normal path of 
destruction.
    When that happens the action handler for the IRQ is removed the moment we 
call
    pirq_guest_unbind. As such we only have to deal with the potential 
outstanding
    pirq_dpci waiting to be serviced. Since we did the 'pt_pirq_softirq_reset'
    we have changed the state from STATE_SCHED to zero. That means dpci_softirq 
will
    NOT go in:
        if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )               
        {                                                                       
            hvm_dirq_assist(d, pirq_dpci);                                      
            put_domain(d);                                                      
        }                                         
   and not call hvm_dirq_assist. The 'masked' will be set which is OK, as the 
moment
   the pirq is restored (pt_irq_create_bind), the 'hvm_do_IRQ_dpci' will stamp
   'masked' again and schedule the hvm_pirq_dpic. In that sense that logic is
   unchanged. We could add and 'else pirq_dpci->masked = 0' for clarity in
   'dpci_softirq' but it does not change the logic (as only hvm_dirq_assist 
cares
   about it and it is not called).

2). The 'pt_irq_guest_eoi' cancelling the 'hvm_dirq_assist' from running. That
   now is added - which f6dd295381f4b6a66acddacf46bca8940586c8d8
    "dpci: replace tasklet with softirq" broke. That is the cancellation between
   IRQ timeout and the hvm_dirq_assist.

3). 'pt_irq_destroy_bind' and an outstanding timer (so two cancellations at 
once).

   If during that time (say before the 8ms is up) the 'pt_irq_destroy_bind' is
   called it will:

    a) clear all the 'digl_list' entries - so if pt_irq_guest_eoi -
      (which takes the same lock as 'pt_irq_destroy_bind') is invoked at that
      time it won't be able to do much.
    b) kills the timer (no more 'pt_irq_guest_eoi' ),
    c) and clear STATE_SCHED so it will no longer try to run 'hvm_dirq_assist'.

   If during that time an interrupt does happen, depending on where we are in 
the
   teardown path - it will either send one more event in the guest or just not
   do much.

   (Thought I did spot a bug in that, see #5 below).


 4). The other case is where 'pt_irq_destroy_bind' is called while 
'pt_irq_guest_eoi'
  (timeout) is in progress. AT that point nothing can happen as 
'pt_irq_time_out' has
  taken the lock, so 'pt_irq_destroy_bind' cannot continue until the lock is 
released.

  If another interrupt is triggered and it is scheduled in, we are back at the 
3)
  scenario.

 5). 'pt_irq_destroy_bind' is called while an 'hvm_dirq_assist' is just about
to service. The 'pt_irq_destroy_bind' takes the lock first and kills the
timer (and it failed at the cmpxchg since the 'state' is in STATE_RUN)  - and 
the
moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 4d8c02f..c5cf7c7 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -539,6 +539,7 @@ int pt_irq_destroy_bind(
          * call to pt_pirq_softirq_reset.
          */
         pt_pirq_softirq_reset(pirq_dpci);
+        pirq_dpci->masked = 0;
 
         pirq_cleanup_check(pirq, d);
     }


which will be make 'hvm_dirq_assist' inhibit calling 'set_timer' as it won't
go past:
    if ( test_and_clear_bool(pirq_dpci->masked) )                               

(which is right after the spin-lock that 'pt_irq_destroy_bind' spins on).

Let me roll in the above code snippet in the patch.

Now looking at the various error cancellations points, we have the 
PT_IRQ_TYPE_MSI
which we can ignore as we do not use the timer (and hvm_dirq_assist never gets 
to
call 'set_timer). The PT_IRQ_TYPE_PCI|PT_IRQ_TYPE_MSI_TRANSLATE error path we 
can
ignore that as 'There is no path for __do_IRQ to schedule softirq as IRQ_GUEST 
is
not set.' - so it won't ever schedule the 'hvm_dirq_dpci'. The only case we do
have to worry about is during the destruction.

Attached and inline patch:

From 23cc8d6039c1a6002cbcb7ca4a06b20b818d3534 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
"dpci: replace tasklet with softirq" used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

    \-timer_softirq_action
        pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
            pirq_dpci is still on dpci_list.
    \- dpci_sofitrq
        while (!list_emptry(&our_list))
                list_del, but has not yet done 'entry->next = LIST_POISON1;'
    [interrupt happens]
        raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
    [interrupt is done, back to dpci_softirq]
        finishes the entry->next = LIST_POISON1;
                .. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
        ends the loop, exits.

    \- dpci_softirq
        while (!list_emtpry)
                list_del, but ->next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.

Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
v2: Deal with 'masked' in pt_irq_destroy_bind.
---
 xen/drivers/passthrough/io.c | 6 ++++--
 xen/include/xen/hvm/irq.h    | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..c5cf7c7 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -142,7 +142,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pirq_dpci->masked = 0;
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -539,6 +539,7 @@ int pt_irq_destroy_bind(
          * call to pt_pirq_softirq_reset.
          */
         pt_pirq_softirq_reset(pirq_dpci);
+        pirq_dpci->masked = 0;
 
         pirq_cleanup_check(pirq, d);
     }
@@ -610,6 +611,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
+    pirq_dpci->masked = 1;
     raise_softirq_for(pirq_dpci);
     return 1;
 }
@@ -669,7 +671,7 @@ static void hvm_dirq_assist(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci)
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( pirq_dpci->state )
+    if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 9709397..3996f1f 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -94,6 +94,7 @@ struct hvm_irq_dpci {
 struct hvm_pirq_dpci {
     uint32_t flags;
     unsigned int state;
+    bool_t masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
-- 
1.9.3

Attachment: 0001-dpci-Add-masked-as-an-gate-for-hvm_dirq_assist-to-pr.patch
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.