[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V2] x86/altp2m: Fixed crash with INVALID_ALTP2M EPTP index

On 06/26/2018 02:56 PM, Jan Beulich wrote:
>>>> On 26.06.18 at 12:55, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>> On 06/26/2018 01:26 PM, Jan Beulich wrote:
>>>>>> On 25.06.18 at 16:10, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>> --- a/xen/arch/x86/mm/altp2m.c
>>>> +++ b/xen/arch/x86/mm/altp2m.c
>>>> @@ -58,8 +58,8 @@ altp2m_vcpu_destroy(struct vcpu *v)
>>>>      altp2m_vcpu_reset(v);
>>>> -    altp2m_vcpu_update_p2m(v);
>>>>      altp2m_vcpu_update_vmfunc_ve(v);
>>>> +    altp2m_vcpu_update_p2m(v);
>>> I agree this addresses this specific incarnation of the problem. However,
>>> if the vCPU indeed runs while being manipulated, I don't think you get
>>> rid of the race this way. For one, there is e.g. a solitary call to
>>> altp2m_vcpu_update_vmfunc_ve() in the handling of
>>> HVMOP_altp2m_vcpu_enable_notify. That'll lead to
>>> EPTP_INDEX won't be written. Whatever previous value is in place
>>> would then be read back in VM exit handling.
>>> With that it also looks to me as if the two step (and hence non-atomic
>>> from the perspective of the guest) update is a problem. Even with the
>>> change above, the VM exit may now happen exactly between the two
>>> function calls.
>>> It seems to me that pausing the vCPU is almost unavoidable (and then
>>> the ordering of the two calls is relevant only because
>>> vmx_vcpu_update_eptp() would better respect the intended new
>> I see. Would not then your scenario be covered by
>> altp2m_vcpu_update_vmfunc_ve() always calling altp2m_vcpu_update_p2m()
>> at the very end (and removing duplicate calls to altp2m_vcpu_update_p2m())?
> No - there would then still be a window of time where the bit is set but
> the index was not (yet) written.
>> Unless something is very wrong, all calls to
>> altp2m_vcpu_update_vmfunc_ve() _should_ happen within a pause, so no
>> calls to the VM exit handler should occur between them if they become a
>> single block of logic within altp2m_vcpu_update_vmfunc_ve().
> Did I misunderstand your analysis mails then? It looked to me as if
> you were observing exactly such races, because of the vCPU not
> being paused.

No, you've understood my initial email perfectly, but I had
unfortunately misread the log. I've attempted to clarify things with the
follow-up (which clearly has not been eloquent) (the "apologies, I seem
to have misread the crash timing" email).

I've misread my log because altp2m_vcpu_update_vmfunc_ve() was called
the first time on handling HVMOP_altp2m_vcpu_enable_notify, and a second
time on altp2m_vcpu_destroy() - and looking at it I've first thought
that the call leading to the crash happened in altp2m_vcpu_destroy()
(which would have indeed implied a nasty race with the VM exit handler).

But then I've looked closer and found that the crash was happening soon
after the HVMOP_altp2m_vcpu_enable_notify handling, in the second run of
the test application (the first run was writing INVALID_ALTP2M in
EPTP_INDEX on altp2m_vcpu_destroy()).

The VCPU pausing seem to work as expected. Sorry for the misunderstanding.

>>> I also think that you'd better Cc the VMX maintainers here, even if
>>> the fix itself is outside the code their maintainership covers.
>> Sorry, I don't follow. The MAINTAINTERS file lists only Jun Nakajima and
>> Kevin Tian under "INTEL(R) VT FOR X86 (VT-X)" (where
>> xen/arch/x86/hvm/vmx/ and xen/include/asm-x86/hvm/vmx/ live), and they
>> are both CCd here. What am I missing?
> Your patch submission didn't have them on Cc, I've added them when
> replying.

Understood, thanks.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.