Xen project Mailing List

Re: [Xen-devel] [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 11 Oct 2013 10:02:50 +0100

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Fri, 11 Oct 2013 09:03:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/10/2013 09:07, Keir Fraser wrote: > On 11/10/2013 08:12, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > >>>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@xxxxxxxxx> wrote: >>> On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote: >>> >>>>> Just taking the lock for the old processor seemed sufficient to me as >>>>> anything seeing the new value would lock and unlock using the same new >>>>> value. But do we need to take the schedule_lock for the new processor >>>>> as well (in the right order of course)? >>>> David and I have been discussing this for a while, involving a >>>> whiteboard, and not come to a firm conclusion either way. >>>> >>>> From my point of view, holding the appropriate vcpu schedule lock >>>> entitles you to play with vcpu scheduling state, which involves >>>> following v->sched_priv which we update outside the critical region later. >>>> >>>> Only taking the one lock still leaves a race condition where another cpu >>>> can follow the new v->processor and obtain the schedule lock, at which >>>> point we have two threads both working on the internals of a vcpu. The >>>> change below certainly will fix the current bug of locking one spinlock >>>> and unlocking another. >>>> >>>> My gut feeling is that we do need to take both locks to be safe in terms >>>> of data access, but we would appreciate advice from someone more >>>> familiar with the scheduler locking. >>> If it's that tricky to work out, why not just take the two locks, >>> appropriately ordered? This isn't a hot path. >> Shouldn't we rather fix the locking mechanism itself, making >> vcpu_schedule_lock...() return the lock, such that the unlock >> will unavoidably use the correct lock? >> >> That would at once allow dropping vcpu_schedule_unlock...() >> altogether, which would be a good thing even if only considering >> the explicit uses of local_irq_disable() there (instead of using the >> right spin lock primitives). And if done that way, replacing the >> explicit uses of local_irq_enable() in the locking paths would also >> seem rather desirable - after all this defeats the spin lock >> primitives wanting to re-enable interrupts while waiting for a >> lock. > It feels to me like this is separate from Andrew's concern. Also I think > that holding the schedule_lock should protect you from changes to > v->processor. But if that's really unreasonable (e.g., inefficient) then > your suggestion here is perfectly sensible. > > Improving the vcpu_schedule_lock_irq implementations to use the providied > underlying spin_lock_irq functions would also be nice, I guess :) This is an orthogonal issue which could do with fixing. Do note that simply making changes to vcpu_schedule_lock() to return the appropriate lock is not sufficient to fix this issue, as the race with changing v->processor can cause two cpus to "successfully" lock the vcpu schedule lock for a particular vcpu. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.