From 6b32dccfbe00518d3ca9cd94d19a6e007b2645d9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 17 Mar 2015 09:46:09 -0400 Subject: [PATCH] dpci: when scheduling spin until STATE_RUN or STATE_SCHED has been cleared. There is race when we clear the STATE_SCHED in the softirq - which allows the 'raise_softirq_for' (on another CPU) to schedule the dpci. Specifically this can happen whenthe other CPU receives an interrupt, calls 'raise_softirq_for', and puts the dpci on its per-cpu list (same dpci structure). There would be two 'dpci_softirq' running at the same time (on different CPUs) where on one CPU it would be executing hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) and on the other CPU it is trying to call: if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) BUG(); Since STATE_RUN is already set it would end badly. The reason we can get his with this is when an interrupt affinity is set over multiple CPUs. Potential solutions: a) Instead of the BUG() we can put the dpci back on the per-cpu list to deal with later (when the softirq are activated again). This putting the 'dpci' back on the per-cpu list is an spin until the bad condition clears. b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for to detect for 'STATE_RUN' bit being set and schedule the dpci in a more safe manner (delay it). The dpci would stil not be scheduled when STATE_SCHED bit was set. c) This patch explores a third option - we will only schedule the dpci when the state is cleared (no STATE_SCHED and no STATE_RUN). We will spin if STATE_RUN is set (as it is in progress and will finish). If the STATE_SCHED is set (so hasn't run yet) we won't try to spin and just exit. This can cause an dead-lock if the interrupt comes when we are processing the dpci in the softirq. Interestingly the old ('tasklet') code used an a) mechanism. If the function assigned to the tasklet was running - the softirq that ran said function (hvm_dirq_assist) would be responsible for putting the tasklet back on the per-cpu list. This would allow to have an running tasklet and an 'to-be-scheduled' tasklet at the same time. This solution moves this 'to-be-scheduled' job to be done in 'raise_softirq_for' (instead of the 'softirq'). Reported-by: Sander Eikelenboom Reported-by: Malcolm Crossley Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/passthrough/io.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index ae050df..9c30ebb 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -63,10 +63,32 @@ enum { static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) { unsigned long flags; + unsigned long old; - if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) - return; - + /* + * This cmpxchg spins until the state is zero (unused). + */ + for ( ;; ) + { + old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED); + switch ( old ) + { + case (1 << STATE_SCHED): + /* + * Whenever STATE_SCHED is set we MUST not schedule it. + */ + return; + case (1 << STATE_RUN) | (1 << STATE_SCHED): + case (1 << STATE_RUN): + /* Getting close to finish. Spin. */ + continue; + } + /* + * If the 'state' is 0 (not in use) we can schedule it. + */ + if ( old == 0 ) + break; + } get_knownalive_domain(pirq_dpci->dom); local_irq_save(flags); -- 2.1.0