[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 3/3] xen/events: rework fifo queue locking



On 24.11.2020 08:01, Juergen Gross wrote:
> Two cpus entering evtchn_fifo_set_pending() for the same event channel
> can race in case the first one gets interrupted after setting
> EVTCHN_FIFO_PENDING and when the other one manages to set
> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
> lead to evtchn_check_pollers() being called before the event is put
> properly into the queue, resulting eventually in the guest not seeing
> the event pending and thus blocking forever afterwards.
> 
> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
> lock") made the race just more obvious, while the fifo event channel
> implementation had this race from the beginning when an unmask operation
> was running in parallel with an event channel send operation.

Ah yes, but then also only for inter-domain channels, as it was
only in that case that the "wrong" domain's event lock was held.
IOW there was a much earlier change already where this issue
got widened (when the per-channel locking got introduced). This
then got reduced to the original scope by XSA-343's adding of
locking to evtchn_unmask(). (Not sure how much of this history
wants actually adding here. I'm writing it down not the least to
make sure I have a complete enough picture.)

> For avoiding this race the queue locking in evtchn_fifo_set_pending()
> needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
> EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too.

Perhaps mention the prior possible (and imo more natural)
alternative of taking consistent per-channel locks would have
been an alternative, until they got converted to rw ones?

> Additionally when an
> event channel needs to change queues both queues need to be locked
> initially.

Since this was (afaict) intentionally not the case before, I
think I would want to see a word spent on the "why", perhaps
better in a code comment than here. Even more so that you
delete a respective comment justifying the possible race as
permissible. And I have to admit right now I'm still uncertain
both ways, i.e. I neither have a clear understanding of why it
would have been considered fine the other way around before,
nor why the double locking is strictly needed.

> Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
> Fixes: 88910061ec615b2d ("evtchn: add FIFO-based event channel hypercalls and 
> port ops")
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>

I guess at least this one wants a Reported-by.

> @@ -204,6 +175,48 @@ static void evtchn_fifo_set_pending(struct vcpu *v, 
> struct evtchn *evtchn)
>          return;
>      }
>  
> +    for ( try = 0; ; try++ )
> +    {
> +        union evtchn_fifo_lastq lastq;
> +        struct vcpu *old_v;

I think this one can have const added.

> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> +        old_v = d->vcpu[lastq.last_vcpu_id];
> +
> +        q = &v->evtchn_fifo->queue[evtchn->priority];
> +        old_q = &old_v->evtchn_fifo->queue[lastq.last_priority];
> +
> +        if ( q <= old_q )
> +        {
> +            spin_lock_irqsave(&q->lock, flags);
> +            if ( q != old_q )
> +                spin_lock(&old_q->lock);
> +        }
> +        else
> +        {
> +            spin_lock_irqsave(&old_q->lock, flags);
> +            spin_lock(&q->lock);
> +        }

Since the vast majority of cases is going to be q == old_q, would
it be worth structuring this like

        if ( q == old_q )
            spin_lock_irqsave(&q->lock, flags);
        else if ( q < old_q )
        {
            spin_lock_irqsave(&q->lock, flags);
            spin_lock(&old_q->lock);
        }
        else
        {
            spin_lock_irqsave(&old_q->lock, flags);
            spin_lock(&q->lock);
        }

saving (on average) half a conditional in this most common
case? (This is specifically different from the double locking in
event_channel.c, where the common case is to have different
entities. In fact double_evtchn_{,un}lock() look to pointlessly
check for chn1 == chn2 - I guess I'll make a patch.)

> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> +        old_v = d->vcpu[lastq.last_vcpu_id];
> +        if ( q == &v->evtchn_fifo->queue[evtchn->priority] &&
> +             old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] )
> +            break;
> +
> +        if ( q != old_q )
> +            spin_unlock(&old_q->lock);
> +        spin_unlock_irqrestore(&q->lock, flags);
> +
> +        if ( try == 3 )
> +        {
> +            gprintk(XENLOG_WARNING,
> +                    "dom%d port %d lost event (too many queue changes)\n",
> +                    d->domain_id, evtchn->port);
> +            return;

Originally evtchn_check_pollers() would still have been called
in this case. Wouldn't you better retain this, or else justify
the possibly observable change in behavior?

> @@ -228,22 +239,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, 
> struct evtchn *evtchn)
>              goto done;
>          }

This if() right above here can, aiui, in principle be moved out
of the surrounding if(), at which point ...

> -        /*
> -         * No locking around getting the queue. This may race with
> -         * changing the priority but we are allowed to signal the
> -         * event once on the old priority.
> -         */
> -        q = &v->evtchn_fifo->queue[evtchn->priority];
> -
> -        old_q = lock_old_queue(d, evtchn, &flags);
> -        if ( !old_q )
> -            goto done;

... with all of this gone ...

>          if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) )
> -        {
> -            spin_unlock_irqrestore(&old_q->lock, flags);
>              goto done;
> -        }

... this could become part of the outer if(), replacing the 2nd
guest_test_bit() there. (Possibly, if deemed worthwhile at all,
to be carried out in a separate follow-on patch, to keep focus
here on the actual goal.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.