Xen project Mailing List

Re: [PATCH v8 2/3] xen/events: rework fifo queue locking

Date: Fri, 27 Nov 2020 14:11:54 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Fri, 27 Nov 2020 13:11:57 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.11.2020 11:51, Juergen Gross wrote: > Two cpus entering evtchn_fifo_set_pending() for the same event channel > can race in case the first one gets interrupted after setting > EVTCHN_FIFO_PENDING and when the other one manages to set > EVTCHN_FIFO_LINKED before the first one is testing that bit. This can > lead to evtchn_check_pollers() being called before the event is put > properly into the queue, resulting eventually in the guest not seeing > the event pending and thus blocking forever afterwards. > > Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel > lock") made the race just more obvious, while the fifo event channel > implementation had this race from the beginning when an unmask operation > was running in parallel with an event channel send operation. I notice you've altered the Fixes: tag, but you still say "from the beginning" here. > Using a spinlock for the per event channel lock is problematic due to > some paths needing to take the lock are called with interrupts off, so > the lock would need to disable interrupts, which in turn breaks some > use cases related to vm events. This reads as if it got put here by mistake. May I suggest to start with "Using ... had turned out problematic ..." and then add something like "Therefore that lock was switched to an rw one"? > For avoiding this race the queue locking in evtchn_fifo_set_pending() > needs to be reworked to cover the test of EVTCHN_FIFO_PENDING, > EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an > event channel needs to change queues both queues need to be locked > initially. "... in order to avoid having a window with no lock held at all"? > @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct vcpu *v, > struct evtchn *evtchn) > return; > } > > + /* > + * Lock all queues related to the event channel (in case of a queue > change > + * this might be two). > + * It is mandatory to do that before setting and testing the PENDING bit > + * and to hold the current queue lock until the event has put into the "has been" or "was" I think? With adjustments along these lines (which I guess could again be done while committing) or reasons against supplied Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> One aspect which I wonder whether it wants adding to the description is that this change makes a bad situation worse: Back at the time per-channel locks were added to avoid the bottleneck of the per- domain event lock. While a per-queue lock's scope at least isn't the entire domain, their use on the send path still has been preventing full parallelism here. And this patch widens the lock holding region. If at least there was a fast path not requiring any locking ... Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.