[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



On Thu, 2017-11-09 at 03:05 -0700, Jan Beulich wrote:
> > > > On 07.11.17 at 16:52, <igor.druzhinin@xxxxxxxxxx> wrote:
> > 
> > There is one things that I'm worrying about with this approach:
> > 
> > At this place we just sync the idle context because we know that we
> > are
> > going to deal with VMCS later. But what about other potential cases
> > (perhaps some softirqs) in which we are accessing a vCPU data
> > structure
> > that is currently shared between different pCPUs. Maybe we'd better
> > sync
> > the context as soon as possible after we switched to idle from a
> > migrated vCPU.
> 
> Short of feedback from the scheduler maintainers I've looked
> into this some more.
>
Sorry, as you may have seen by the other email, I was looking into this
today.

> Calling sync_vcpu_execstate() out of
> vcpu_move_locked() would seem feasible, but there are a number
> of other places where ->processor of a vCPU is being updated,
> and where the vCPU was not (obviously) put to sleep already:
> 
> - credit1's csched_runq_steal()
> - credit2's migrate()
> - csched2_schedule()
> - null's vcpu_assign() when called out of null_schedule()
> - rt_schedule()
> 
> I don't think it is reasonable to call sync_vcpu_execstate() in all
> of
> those places, 
>
Yes, I agree.

> and hence VMX'es special VMCS management may
> indeed better be taken care of by VMX-specific code (i.e. as
> previously indicated the sync_local_execstate() invocation moved
> from vcpu_destroy() - where my most recent patch draft had put
> it - to vmx_vcpu_destroy()). 
>
I was still trying to form an opinion about the issue, and was leaning
toward suggesting exactly the same.

In fact, the point of lazy context switch is exactly that: trying to
save syncing the state. Of course, that requires that we identify all
the places and occasions where having the state out of sync may be a
problem, and sync it!.

What seems to me to be happening here is as follows:

 a. a pCPU becomes idle
 b. we do the lazy switch, i.e., the context of the previously running 
    vCPU stays on the pCPU
 c. *something* happens which wants to either play with or alter the 
    context currently loaded on the pCPU (in this case it's VMX bits  
    of the context, but it could be other parts of it too) that is 
    loaded on the pCPU

Well, I'm afraid I only see two solutions:
1) we get rid of lazy context switch;
2) whatever it is that is happening at point c above, it needs to be 
   aware that we use lazy context switch, and make sure to sync the 
   context before playing with or altering it;

> And we'd have to live with other
> VMX code paths having similar asynchronous behavior needing to
> similarly take care of the requirement.
> 
Exactly.

And in fact, in this thread, migration of vCPUs between pCPUs was
mentioned, and it was being considered to treat that in a special way. 

But it looks to me that something very similar may, at least in theory,
happen any time a lazy context switch occurs, no matter whether the
pCPU has become idle because the previously running vCPU wants to move,
or because it blocked for whatever other reason.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.