|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
Tuesday, March 17, 2015, 6:44:54 PM, you wrote:
>> >> Additionally I think it should be considered whether the bitmap
>> >> approach of interpreting ->state is the right one, and we don't
>> >> instead want a clean 3-state (idle, sched, run) model.
>> >
>> > Could you elaborate a bit more please? As in three different unsigned int
>> > (or bool_t) that set in what state we are in?
>>
>> An enum { STATE_IDLE, STATE_SCHED, STATE_RUN }. Especially
>> if my comment above turns out to be wrong, you'd have no real
>> need for the SCHED and RUN flags to be set at the same time.
> I cobbled what I believe is what you were thinking off.
> As you can see to preserve the existing functionality such as
> being able to schedule N amount of interrupt injections
> for the N interrupts we might get - I modified '->masked'
> to be an atomic counter.
> The end result is that we can still live-lock. Unless we:
> - Drop on the floor the injection of N interrupts and
> just deliever at max one per VMX_EXIT (and not bother
> with interrupts arriving when we are in the VMX handler).
> - Alter the softirq code slightly - to have an variant
> which will only iterate once over the pending softirq
> bits per call. (so save an copy of the bitmap on the
> stack when entering the softirq handler - and use that.
> We could also xor it against the current to catch any
> non-duplicate bits being set that we should deal with).
> Here is the compile, but not run-time tested patch.
> From e7d8bcd7c5d32c520554a4ad69c4716246036002 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Date: Tue, 17 Mar 2015 13:31:52 -0400
> Subject: [RFC PATCH] dpci: Switch to tristate instead of bitmap
> *TODO*:
> - Writeup.
> - Tests
Done, and unfortunately it doesn't fly ..
Some devices seem to work fine, others don't receive any interrupts shortly
after boot like:
40: 3 0 0 0 xen-pirq-ioapic-level
cx25821[1]
Don't see any crashes or errors though, so it seems to silently lock somewhere.
--
Sander
> Suggested-by: Jan Beulich <jbuelich@xxxxxxxx>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
> xen/drivers/passthrough/io.c | 140
> ++++++++++++++++++++++++-------------------
> xen/include/xen/hvm/irq.h | 4 +-
> 2 files changed, 82 insertions(+), 62 deletions(-)
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..663e104 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -30,42 +30,28 @@
> static DEFINE_PER_CPU(struct list_head, dpci_list);
>
> /*
> - * These two bit states help to safely schedule, deschedule, and wait until
> - * the softirq has finished.
> - *
> - * The semantics behind these two bits is as follow:
> - * - STATE_SCHED - whoever modifies it has to ref-count the domain (->dom).
> - * - STATE_RUN - only softirq is allowed to set and clear it. If it has
> - * been set hvm_dirq_assist will RUN with a saved value of the
> - * 'struct domain' copied from 'pirq_dpci->dom' before STATE_RUN was
> set.
> - *
> - * The usual states are: STATE_SCHED(set) -> STATE_RUN(set) ->
> - * STATE_SCHED(unset) -> STATE_RUN(unset).
> - *
> - * However the states can also diverge such as: STATE_SCHED(set) ->
> - * STATE_SCHED(unset) -> STATE_RUN(set) -> STATE_RUN(unset). That means
> - * the 'hvm_dirq_assist' never run and that the softirq did not do any
> - * ref-counting.
> - */
> -
> -enum {
> - STATE_SCHED,
> - STATE_RUN
> -};
> -
> -/*
> * This can be called multiple times, but the softirq is only raised once.
> - * That is until the STATE_SCHED state has been cleared. The state can be
> - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
> - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
> + * That is until state is in init. The state can be changed by:
> + * the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
> + * or by 'pt_pirq_softirq_reset' (which will try to init the state before
> * the softirq had a chance to run).
> */
> static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
> {
> unsigned long flags;
>
> - if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> + switch ( cmpxchg(&pirq_dpci->state, STATE_INIT, STATE_SCHED) )
> + {
> + case STATE_RUN:
> + case STATE_SCHED:
> + /*
> + * The pirq_dpci->mapping has been incremented to let us know
> + * how many we have left to do.
> + */
> return;
> + case STATE_INIT:
> + break;
> + }
>
> get_knownalive_domain(pirq_dpci->dom);
>
> @@ -85,7 +71,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci
> *pirq_dpci)
> */
> bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
> {
> - if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
> + if ( pirq_dpci->state != STATE_INIT )
> return 1;
>
> /*
> @@ -109,22 +95,22 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci
> *pirq_dpci)
>
> ASSERT(spin_is_locked(&d->event_lock));
>
> - switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
> + switch ( cmpxchg(&pirq_dpci->state, STATE_SCHED, STATE_INIT) )
> {
> - case (1 << STATE_SCHED):
> + case STATE_SCHED:
> /*
> - * We are going to try to de-schedule the softirq before it goes in
> - * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
> + * We are going to try to de-schedule the softirq before it goes to
> + * running state. Whoever moves from sched MUST refcount the 'dom'.
> */
> put_domain(d);
> /* fallthrough. */
> - case (1 << STATE_RUN):
> - case (1 << STATE_RUN) | (1 << STATE_SCHED):
> + case STATE_RUN:
> + case STATE_INIT:
> /*
> - * The reason it is OK to reset 'dom' when STATE_RUN bit is set is
> due
> + * The reason it is OK to reset 'dom' when init or running is set is
> due
> * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
> - * in local variable before it sets STATE_RUN - and therefore will
> not
> - * dereference '->dom' which would crash.
> + * in local variable before it sets state to running - and therefore
> + * will not dereference '->dom' which would crash.
> */
> pirq_dpci->dom = NULL;
> break;
> @@ -135,7 +121,7 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci
> *pirq_dpci)
> * or never initialized it). Note that we hold the lock that
> * 'hvm_dirq_assist' could be spinning on.
> */
- pirq_dpci->>masked = 0;
> + atomic_set(&pirq_dpci->masked, 0);
> }
>
> bool_t pt_irq_need_timer(uint32_t flags)
> @@ -149,7 +135,8 @@ static int pt_irq_guest_eoi(struct domain *d, struct
> hvm_pirq_dpci *pirq_dpci,
> if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
> &pirq_dpci->flags) )
> {
- pirq_dpci->>masked = 0;
> + ASSERT(atomic_read(&pirq_dpci->masked) <= 1);
> + atomic_set(&pirq_dpci->masked, 0);
> pirq_dpci->pending = 0;
> pirq_guest_eoi(dpci_pirq(pirq_dpci));
> }
> @@ -278,6 +265,8 @@ int pt_irq_create_bind(
> * As such on every 'pt_irq_create_bind' call we MUST set it.
> */
> pirq_dpci->dom = d;
> + atomic_set(&pirq_dpci->masked, 0);
> + pirq_dpci->state = STATE_INIT;
> /* bind after hvm_irq_dpci is setup to avoid race with irq
> handler*/
> rc = pirq_guest_bind(d->vcpu[0], info, 0);
> if ( rc == 0 && pt_irq_bind->u.msi.gtable )
> @@ -398,6 +387,8 @@ int pt_irq_create_bind(
> if ( pt_irq_need_timer(pirq_dpci->flags) )
> init_timer(&pirq_dpci->timer, pt_irq_time_out, pirq_dpci, 0);
> /* Deal with gsi for legacy devices */
> + atomic_set(&pirq_dpci->masked, 0);
> + pirq_dpci->state = STATE_INIT;
> rc = pirq_guest_bind(d->vcpu[0], info, share);
> if ( unlikely(rc) )
> {
> @@ -576,6 +567,7 @@ bool_t pt_pirq_cleanup_check(struct hvm_pirq_dpci *dpci)
> if ( !dpci->flags && !pt_pirq_softirq_active(dpci) )
> {
> dpci->dom = NULL;
+ dpci->>state = STATE_INIT;
> return 1;
> }
> return 0;
> @@ -617,7 +609,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
> !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> return 0;
>
- pirq_dpci->>masked = 1;
> + atomic_inc(&pirq_dpci->masked);
> raise_softirq_for(pirq_dpci);
> return 1;
> }
> @@ -672,12 +664,13 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
> spin_unlock(&d->event_lock);
> }
>
> -static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci
> *pirq_dpci)
> +static bool_t hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci
> *pirq_dpci)
> {
> + int more;
> ASSERT(d->arch.hvm_domain.irq.dpci);
>
> spin_lock(&d->event_lock);
> - if ( test_and_clear_bool(pirq_dpci->masked) )
> + while ( (more = !atomic_dec_and_test(&pirq_dpci->masked)) )
> {
> struct pirq *pirq = dpci_pirq(pirq_dpci);
> const struct dev_intx_gsi_link *digl;
> @@ -687,17 +680,13 @@ static void hvm_dirq_assist(struct domain *d, struct
> hvm_pirq_dpci *pirq_dpci)
> send_guest_pirq(d, pirq);
>
> if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
> - {
> - spin_unlock(&d->event_lock);
> - return;
> - }
> + break;
> }
>
> if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
> {
> vmsi_deliver_pirq(d, pirq_dpci);
> - spin_unlock(&d->event_lock);
> - return;
> + break;
> }
>
> list_for_each_entry ( digl, &pirq_dpci->digl_list, list )
> @@ -710,8 +699,7 @@ static void hvm_dirq_assist(struct domain *d, struct
> hvm_pirq_dpci *pirq_dpci)
> {
> /* for translated MSI to INTx interrupt, eoi as early as
> possible */
> __msi_pirq_eoi(pirq_dpci);
> - spin_unlock(&d->event_lock);
> - return;
> + break;
> }
>
> /*
> @@ -725,6 +713,8 @@ static void hvm_dirq_assist(struct domain *d, struct
> hvm_pirq_dpci *pirq_dpci)
> set_timer(&pirq_dpci->timer, NOW() + PT_IRQ_TIME_OUT);
> }
> spin_unlock(&d->event_lock);
> +
> + return more;
> }
>
> static void __hvm_dpci_eoi(struct domain *d,
> @@ -781,7 +771,7 @@ unlock:
> }
>
> /*
> - * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to
> + * Note: 'pt_pirq_softirq_reset' can move the state to init before we get to
> * doing it. If that is the case we let 'pt_pirq_softirq_reset' do
> ref-counting.
> */
> static void dpci_softirq(void)
> @@ -797,23 +787,53 @@ static void dpci_softirq(void)
> {
> struct hvm_pirq_dpci *pirq_dpci;
> struct domain *d;
> + bool_t again = 0;
>
> pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci,
> softirq_list);
> list_del(&pirq_dpci->softirq_list);
>
> d = pirq_dpci->dom;
> smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> - if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> - BUG();
> - /*
> - * The one who clears STATE_SCHED MUST refcount the domain.
> - */
> - if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )
> +
> + switch ( cmpxchg(&pirq_dpci->state, STATE_SCHED, STATE_RUN) )
> + {
> + case STATE_INIT:
> + /* pt_pirq_softirq_reset cleared it. Let it do the
> ref-counting. */
> + continue;
> + case STATE_RUN:
> + again = 1;
> + /* Fall through. */
> + case STATE_SCHED:
> + break;
> + }
> + if ( !again )
> {
> - hvm_dirq_assist(d, pirq_dpci);
> + /*
> + * The one who changes sched to running MUST refcount the domain.
> + */
> + again = hvm_dirq_assist(d, pirq_dpci);
> put_domain(d);
> + switch ( cmpxchg(&pirq_dpci->state, STATE_RUN, STATE_INIT) )
> + {
> + case STATE_SCHED:
> + case STATE_INIT:
> + BUG();
> + case STATE_RUN:
> + break;
> + }
> + }
> + if ( again )
> + {
> + unsigned long flags;
> +
> + /* Put back on the list and retry. */
> + local_irq_save(flags);
> + list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> + local_irq_restore(flags);
> +
> + raise_softirq(HVM_DPCI_SOFTIRQ);
> + continue;
> }
> - clear_bit(STATE_RUN, &pirq_dpci->state);
> }
> }
>
> diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
> index 3996f1f..48dbc51 100644
> --- a/xen/include/xen/hvm/irq.h
> +++ b/xen/include/xen/hvm/irq.h
> @@ -93,8 +93,8 @@ struct hvm_irq_dpci {
> /* Machine IRQ to guest device/intx mapping. */
> struct hvm_pirq_dpci {
> uint32_t flags;
> - unsigned int state;
> - bool_t masked;
> + enum { STATE_INIT, STATE_SCHED, STATE_RUN } state;
> + atomic_t masked;
> uint16_t pending;
> struct list_head digl_list;
> struct domain *dom;
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |