[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/13 08:12, Jan Beulich wrote:
On 10.10.13 at 20:27, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:

Just taking the lock for the old processor seemed sufficient to me as
anything seeing the new value would lock and unlock using the same new
value.  But do we need to take the schedule_lock for the new processor
as well (in the right order of course)?
David and I have been discussing this for a while, involving a
whiteboard, and not come to a firm conclusion either way.

 From my point of view, holding the appropriate vcpu schedule lock
entitles you to play with vcpu scheduling state, which involves
following v->sched_priv which we update outside the critical region later.

Only taking the one lock still leaves a race condition where another cpu
can follow the new v->processor and obtain the schedule lock, at which
point we have two threads both working on the internals of a vcpu.  The
change below certainly will fix the current bug of locking one spinlock
and unlocking another.

My gut feeling is that we do need to take both locks to be safe in terms
of data access, but we would appreciate advice from someone more
familiar with the scheduler locking.
If it's that tricky to work out, why not just take the two locks,
appropriately ordered? This isn't a hot path.
Shouldn't we rather fix the locking mechanism itself, making
vcpu_schedule_lock...() return the lock, such that the unlock
will unavoidably use the correct lock?

That's an idea; but I half wonder if it wouldn't be better to actually keep vcpu_schedule_unlock(), but pass it the old lock. Then for debug builds we can ASSERT that the lock hasn't changed.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.