[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked



On 19/11/13 18:17, David Vrabel wrote:
> From: David Vrabel <david.vrabel@xxxxxxxxxx>
> 
> An event may still be the tail of a queue even if the queue is now
> empty (an 'old tail' event).  There is logic to handle the case when
> this old tail event needs to be added to the now empty queue (by
> checking for q->tail == port).
> 
> However, this does not cover all cases.
> 
> 1. An old tail may be re-added simultaneously with another event.
>    LINKED is set on the old tail, and the other CPU may misinterpret
>    this as the old tail still being valid and set LINK instead of
>    HEAD.  All events on this queue will then be lost.
> 
> 2. If the old tail event on queue A is moved to a different queue B
>    (by changing its VCPU or priority), the event may then be linked
>    onto queue B.  When another event is linked onto queue A it will
>    check the old tail, see that it is linked (but on queue B) and
>    overwrite the LINK field, corrupting both queues.
> 
> When an event is linked, save the vcpu id and priority of the queue it
> is being linked onto.  Use this when linking an event to check if it
> is an unlinked old tail event.  If it is an old tail event, the old
> queue is empty and old_q->tail is invalidated to ensure adding another
> event to old_q will update HEAD.  The tail is invalidated by setting
> it to 0 since the event 0 is never linked.
> 
> The old_q->lock is held while setting LINKED to avoid the race with
> the test of LINKED in evtchn_fifo_set_link().

Whilst this is considerably more reliable than before it isn't completely
safe.

evtchn_fifo_set_pending() is relying on being serialized for each event
channel. The serialization is required to protect evtchn->last_* and the
split test_bit(LINKED) and set_bit(LINKED).

In most cases the serialization is done by a suitable lock in
event_channel.c. e.g., interdomain event are serialized with the remote
domain's event lock, virqs by the local virq lock.

However, If evtchn_fifo_set_pending() is called from evtchn_fifo_unmask()
or evtchn_fifo_expand_array() then only the local event lock is held
which is different than the lock used for interdomain and virq events.

One race is:

  CPU A                         CPU B

  EVTCHNOP_send()
    evtchn_fifo_set_pending()
      evtchn->last_vcpu = ...
                                old_v = d->vcpu[evtchn->last_vcpu_id]
                                old_q = old_v->queue[evtchn->last_pri]
      evtchn->last_pri = ...
                                lock(old_q) // wrong q
      set_bit(LINKED)
      q->tail = port
      ...
  Guest:
     clear_bit(LINKED)
                                set_bit(LINKED)
                                // q->tail not invalidated
                                unlock(old_q)
                                lock(q)
                                link port to itself.
                                unlock(q)

Linking an event to itself will lose any other event that were in the
queue (they're LINKED but not reachable by the guest)

Another race is:

  CPU A                           CPU B

  EVTCHNOP_send()                 
     evtchn_fifo_set_pending()    clear_bit(MASKED)
       set_bit(PENDING)           test_bit(PENDING)
       test_bit(LINKED)           EVTCHNOP_unmask()
                                  evtchn_fifo_set_pending()
                                    test_bit(LINKED)
       lock(q->lock)
       set_bit(LINKED)
       link to q->tail
       q->tail = port
       unlock(q->lock)
                                    lock(q->lock)
                                    q->tail = 0 because q->tail == port
                                    q->head = port
                                    unlock(q->lock)

The double link of the event loses any other event preceding it in
the queue and those events are lost (they have LINKED set but are
no longer reachable by the guest).

There are two ways to fix this:

1. Introduce a per-event lock and use this to serialize
   evtchn_fifo_set_pending().  After 4.4, this per-event lock can be
   used instead of the domain's event lock and virq lock when raising
   events, hopefully reducing lock contention and improving event
   channel scalability.

   This would half the number of struct evtchn per page as the lock
   takes it over 32 bytes in size.

2. Check for the old_q changing after locking old_q->lock and use
   test_and_set_bit(LINKED) to bail early if another CPU linked it (see
   below patch).

Any opinions on either of these solutions?

--- a/xen/common/event_fifo.c   Tue Nov 19 11:06:54 2013 +0000
+++ b/xen/common/event_fifo.c   Wed Nov 20 16:41:32 2013 +0000
@@ -34,6 +34,30 @@ static inline event_word_t *evtchn_fifo_
     return d->evtchn_fifo->event_array[p] + w;
 }
 
+static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
+                                                struct evtchn *evtchn,
+                                                unsigned long *flags)
+{
+    struct vcpu *v;
+    struct evtchn_fifo_queue *q, *old_q;
+
+    for (;;)
+    {
+        v = d->vcpu[evtchn->last_vcpu_id];
+        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
+
+        spin_lock_irqsave(&old_q->lock, *flags);
+
+        v = d->vcpu[evtchn->last_vcpu_id];
+        q = &v->evtchn_fifo->queue[evtchn->last_priority];
+
+        if ( old_q == q )
+            return old_q;
+
+        spin_unlock_irqrestore(&old_q->lock, *flags);
+    }
+}          
+
 static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link)
 {
     event_word_t new, old;
@@ -127,7 +151,6 @@ static void evtchn_fifo_set_pending(stru
     if ( !test_bit(EVTCHN_FIFO_MASKED, word)
          && !test_bit(EVTCHN_FIFO_LINKED, word) )
     {
-        struct vcpu *old_v;
         struct evtchn_fifo_queue *q, *old_q;
         event_word_t *tail_word;
         bool_t linked = 0;
@@ -139,10 +162,13 @@ static void evtchn_fifo_set_pending(stru
          */
         q = &v->evtchn_fifo->queue[evtchn->priority];
 
-        old_v = d->vcpu[evtchn->last_vcpu_id];
-        old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority];
+        old_q = lock_old_queue(d, evtchn, &flags);
 
-        spin_lock_irqsave(&old_q->lock, flags);
+        if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) )
+        {
+            spin_unlock_irqrestore(&old_q->lock, flags);
+            goto done;
+        }
 
         /*
          * If this event was a tail, the old queue is now empty and
@@ -152,12 +178,9 @@ static void evtchn_fifo_set_pending(stru
         if ( old_q->tail == port )
             old_q->tail = 0;
 
-        set_bit(EVTCHN_FIFO_LINKED, word);
-
         /* Moved to a different queue? */
-        if ( old_q != q)
+        if ( old_q != q )
         {
-
             evtchn->last_vcpu_id = evtchn->notify_vcpu_id;
             evtchn->last_priority = evtchn->priority;
 
@@ -191,7 +214,7 @@ static void evtchn_fifo_set_pending(stru
                                   &v->evtchn_fifo->control_block->ready) )
             vcpu_mark_events_pending(v);
     }
-
+done:
     if ( !was_pending )
         evtchn_check_pollers(d, port);
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.