|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
On 25/05/18 12:36, Jan Beulich wrote:
>>>> On 25.05.18 at 10:36, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 25/05/2018 08:49, Jan Beulich wrote:
>>>>>> On 22.05.18 at 13:20, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v,
>> unsigned int cr,
>>>>
>>>> static void vmx_update_guest_efer(struct vcpu *v)
>>>> {
>>>> - unsigned long vm_entry_value;
>>>> + unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
>>>> + xen_efer = read_efer();
>>>> +
>>>> + if ( paging_mode_shadow(v->domain) )
>>>> + {
>>>> + /*
>>>> + * When using shadow pagetables, EFER.NX is a Xen-owned bit and
>>>> is not
>>>> + * under guest control.
>>>> + */
>>>> + guest_efer &= ~EFER_NX;
>>>> + guest_efer |= xen_efer & EFER_NX;
>>>> +
>>>> + /*
>>>> + * At the time of writing (May 2018), the Intel SDM "VM Entry:
>>>> Checks
>>>> + * on Guest Control Registers, Debug Registers and MSRs" section
>>>> says:
>>>> + *
>>>> + * If the "Load IA32_EFER" VM-entry control is 1, the following
>>>> + * checks are performed on the field for the IA32_MSR:
>>>> + * - Bits reserved in the IA32_EFER MSR must be 0.
>>>> + * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the
>>>> value of
>>>> + * the "IA-32e mode guest" VM-entry control. It must also be
>>>> + * identical to bit 8 (LME) if bit 31 in the CR0 field
>>>> + * (corresponding to CR0.PG) is 1.
>>>> + *
>>>> + * Experimentally what actually happens is:
>>>> + * - Checks for EFER.{LME,LMA} apply uniformly whether using the
>>>> + * GUEST_EFER VMCS controls, or MSR load/save lists.
>>>> + * - Without EPT, LME being different to LMA isn't tolerated by
>>>> + * hardware. As writes to CR0 are intercepted, it is safe to
>>>> + * leave LME clear at this point, and fix up both LME and LMA
>>>> when
>>>> + * CR0.PG is set.
>>>> + */
>>>> + if ( !(guest_efer & EFER_LMA) )
>>>> + guest_efer &= ~EFER_LME;
>>>> + }
>>> Why is this latter adjustments done only for shadow mode?
>> How should I go about making the comment clearer?
>>
>> When EPT is active, hardware is happy with LMA != LME. When EPT is
>> disabled, hardware strictly requires LME == LMA.
> Part of my problem may be that "Without EPT" can have two meanings:
> Hardware without EPT, or EPT disabled on otherwise capable hardware.
Ah ok. Yes - I see the confusion. I'll see about rewording it.
>
>> This particular condition occurs architecturally on the transition into
>> long mode, between setting LME and setting CR0.PG, and updating EFER
>> controls in the naive way results in a vmentry failure.
>>
>> Having spoken to Intel, they agree with my assessment that the docs
>> appear to be correct for Gen1 hardware, and stale for Gen2 hardware,
>> where fixing this was one of many parts of making Unrestricted Guest work.
> This suggests you mean the former, in which case the check really
> doesn't belong inside a paging_mode_shadow() conditional.
Whereas what is meant is the latter. It depends on the EPT setting in
the VMCS, rather than whether the hardware is capable. This is
presumably for backwards compatibility.
>
>>> After the above adjustments, when guest_efer still matches
>>> v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read
>>> intercept?
>> In principle, yes. We use load/save lists, as long as we remembered to
>> recalculate EFER every time CR0 gets modified in the shadow path.
>>
>> However, that would be a net performance penalty rather than benefit
>> (which is why I've gone to the effort of creating load-only lists).
>>
>> In practice, EFER is written at boot and not touched again. Having
>> load/save logic might avoid these vmexits, but at the cost of almost
>> every other vmexit needing to keep the guest_efer in sync with the
>> load/save list or VMCS field.
> I can't seem to connect this to my question about MSR _read_ intercept.
Oh - so it doesn't. I read that as the read/write intercept.
Yes - probably, although I'd have to double check how it interacts with
the introspection interception settings (and the answer is almost
certainly badly. I've got a plan to fix this by maintaining separate
"who wants which MSR intercepted" state, and having a single
recalc_msr_intercept_bitmap() which runs on the hvm_resume() path after
any changes.)
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |