[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



On 07/11/17 14:55, Jan Beulich wrote:
>>>> On 07.11.17 at 15:24, <igor.druzhinin@xxxxxxxxxx> wrote:
>> On 07/11/17 08:07, Jan Beulich wrote:
>>> --- unstable.orig/xen/arch/x86/domain.c
>>> +++ unstable/xen/arch/x86/domain.c
>>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>>  
>>>  void vcpu_destroy(struct vcpu *v)
>>>  {
>>> +    /*
>>> +     * Flush all state for this vCPU before fully tearing it down. This is
>>> +     * particularly important for HVM ones on VMX, so that this flushing of
>>> +     * state won't happen from the TLB flush IPI handler behind the back of
>>> +     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>>> +     */
>>> +    sync_vcpu_execstate(v);
>>> +
>>>      xfree(v->arch.vm_event);
>>>      v->arch.vm_event = NULL;
>>
>> I don't think this is going to fix the problem since vCPU we are
>> currently destroying has nothing to do with the vCPUx that actually
>> caused the problem by its migration. We still are going to call
>> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
> 
> Oh, right, wrong vCPU. This should be better:
> 
> --- unstable.orig/xen/arch/x86/domain.c
> +++ unstable/xen/arch/x86/domain.c
> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>  
>  void vcpu_destroy(struct vcpu *v)
>  {
> +    /*
> +     * Flush all state for the vCPU previously having run on the current CPU.
> +     * This is in particular relevant for HVM ones on VMX, so that this
> +     * flushing of state won't happen from the TLB flush IPI handler behind
> +     * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section.
> +     */
> +    sync_local_execstate();
> +
>      xfree(v->arch.vm_event);
>      v->arch.vm_event = NULL;
>  
> In that case the question then is whether (rather than generalizing
> is, as mentioned for the earlier version) this wouldn't better go into
> vmx_vcpu_destroy(), assuming anything called earlier from
> hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't
> play with VMCSes).

Ah, ok. Does this also apply to the previous issue? May I revert that
change to test it?

There is one things that I'm worrying about with this approach:

At this place we just sync the idle context because we know that we are
going to deal with VMCS later. But what about other potential cases
(perhaps some softirqs) in which we are accessing a vCPU data structure
that is currently shared between different pCPUs. Maybe we'd better sync
the context as soon as possible after we switched to idle from a
migrated vCPU.

Igor

> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.