[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/Intel: virtualize support for cpuid faulting



On 12/10/16 05:07, Kyle Huey wrote:
> rr (http://rr-project.org/), a Linux userspace record-and-replay reverse-
> execution debugger, would like to trap and emulate the CPUID instruction.
> This would allow us to a) mask away certain hardware features that rr does
> not support (e.g. RDRAND) and b) enable trace portability across machines
> by providing constant results. Patches for support in the Linux kernel are in
> flight, and we'd like to be able to use this feature on virtualized Linux
> instances as well.
>
> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
> hardware support for faulting on cpuid is necessary to emulate support with an
> HVM guest.
>
> On PV guests, hardware support is required so that userspace cpuid will trap
> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests 
> (that
> aren't the control domain, see the comment in intel_ctxt_switch_levelling).
> Every guest cpuid will trap via a GP(0) to emulate_privileged_op (via
> do_general_protection). Once there we can simply decline to emulate the cpuid,
> if it comes from the guest's userspace, and instead pass the GP(0) along to 
> the
> guest kernel, thus emulating the cpuid faulting behavior.
>
> Signed-off-by: Kyle Huey <khuey@xxxxxxxxxxxx>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c   | 21 ++++++++++++++++++---
>  xen/arch/x86/traps.c         | 31 ++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/domain.h |  3 +++
>  3 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 50cbfed..be45f74 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2434,6 +2434,14 @@ static int vmx_do_cpuid(struct cpu_user_regs *regs)
>  {
>      unsigned int eax, ebx, ecx, edx;
>      unsigned int leaf, subleaf;
> +    struct segment_register sreg;
> +    struct vcpu *v = current;
> +
> +    hvm_get_segment_register(v, x86_seg_ss, &sreg);
> +    if ( v->arch.cpuid_fault && sreg.attr.fields.dpl > 0 ) {

Newline for brace.  This file is sadly inconsistent with its style, but
we insist that all new code is correct.

> +        hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        return 0;
> +    }
>  
>      eax = regs->eax;
>      ebx = regs->ebx;
> @@ -2701,9 +2709,11 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t *msr_content)
>          break;
>  
>      case MSR_INTEL_PLATFORM_INFO:
> -        if ( rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *msr_content) )
> -            goto gp_fault;
> -        *msr_content = 0;
> +        *msr_content = MSR_PLATFORM_INFO_CPUID_FAULTING;
> +        break;
> +
> +    case MSR_INTEL_MISC_FEATURES_ENABLES:
> +        *msr_content = current->arch.cpuid_fault ? 
> MSR_MISC_FEATURES_CPUID_FAULTING : 0;
>          break;
>  
>      default:
> @@ -2931,6 +2941,11 @@ static int vmx_msr_write_intercept(unsigned int msr, 
> uint64_t msr_content)
>               rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msr_content) )
>              goto gp_fault;
>          break;
> +    case MSR_INTEL_MISC_FEATURES_ENABLES:
> +        if ( msr_content & ~MSR_MISC_FEATURES_CPUID_FAULTING )
> +            goto gp_fault;
> +        v->arch.cpuid_fault = msr_content;

= !!msr_content;

Some compilers complain when assigning a non-binary value into a bool.

> +        break;
>  
>      default:
>          if ( passive_domain_do_wrmsr(msr, msr_content) )
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 24d173f..2783822 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2945,6 +2945,16 @@ static int emulate_privileged_op(struct cpu_user_regs 
> *regs)
>                   rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msr_content) )
>                  goto fail;
>              break;
> +        case MSR_INTEL_MISC_FEATURES_ENABLES:
> +            if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> +                 rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, val) ||
> +                 msr_content & ~MSR_MISC_FEATURES_CPUID_FAULTING )
> +                goto fail;
> +            if ( msr_content == MSR_MISC_FEATURES_CPUID_FAULTING &&
> +                 (!cpu_has_cpuid_faulting || is_control_domain(v->domain)) )
> +                goto fail;
> +            v->arch.cpuid_fault = msr_content;

!!msr_content again.

> +            break;
>  
>          case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
>          case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
> @@ -3079,7 +3089,22 @@ static int emulate_privileged_op(struct cpu_user_regs 
> *regs)
>              if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
>                   rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
>                  goto fail;
> -            regs->eax = regs->edx = 0;
> +            /*
> +             * See the comment in intel_ctxt_switch_levelling about cpuid
> +             * faulting in the control domain.
> +             */

All of these routines run in the correct domain context.  Instead of
leaving logic around like this, it might be better to make the "static
DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);" non-static and use
this_cpu(cpuid_faulting_enabled) instead.

Then, once I fix cpuid faulting for the control domain, I won't need to
alter this code for it to start working.

> +            if ( cpu_has_cpuid_faulting && !is_control_domain(v->domain) )
> +                regs->eax = MSR_PLATFORM_INFO_CPUID_FAULTING;
> +            else
> +                regs->eax = 0;
> +            regs->edx = 0;
> +            break;
> +        case MSR_INTEL_MISC_FEATURES_ENABLES:
> +            if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> +                 rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, val))
> +                goto fail;
> +            regs->eax = v->arch.cpuid_fault ? 
> MSR_MISC_FEATURES_CPUID_FAULTING : 0;
> +            regs->edx = 0;
>              break;
>  
>          case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
> @@ -3137,6 +3162,10 @@ static int emulate_privileged_op(struct cpu_user_regs 
> *regs)
>          break;
>  
>      case 0xa2: /* CPUID */
> +        /* Let the guest have this one */
> +        if ( v->arch.cpuid_fault && !ring_0(regs) )

This should really be v->arch.cpuid_fault && !guest_kernel_mode(v, regs)

32bit PV guest kernels run in ring 1, but we don't want kernel cpuid
invocations to cause a #GP fault.

> +            goto fail;
> +
>          pv_cpuid(regs);
>          break;

Additionally, you must put a similar check in
emulate_forced_invalid_op().  Otherwise, PV guest userspace could still
bypass the kernel setting, and get data not controlled by the RR sandbox.

~Andrew

>  
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 5807a1f..8ebc03b 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -573,6 +573,9 @@ struct arch_vcpu
>      XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
>  
>      struct arch_vm_event *vm_event;
> +
> +    /* Has the guest enabled CPUID faulting? */
> +    bool cpuid_fault;
>  };
>  
>  smap_check_policy_t smap_policy_change(struct vcpu *v,


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.