[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 14 Jan 2022 13:50:48 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UkMMU6L5+gMML10/DhMI0HVmN3YPPrDXHNZKUjqxSuY=; b=dOI+s7SEzL4F9r3QZYR3xOU6x/kBmskB/CEDzgMDTBRKinKRks7g5JmV/FKG6GMH/ZcB1XYjjvsUDq0H4pjLfc8NhWOPJyTx4k58sLfxmZHz7hARWSuasQN7as9npjBLoQUME0eUrq/nkB3gADtRUxdaUAu0q9qyQamFbpmOTK+UoJwpVdZCEoGPrD3mD4Lk5IFJGbNDEhCEIYeePhdku62GLRMLQMPuv7RqFuB1VtyLo5IogXZDhgbPThQ/61vR/MDRcKGCOClnXl3Tz5sipBHre5KymarlMiXUTg/6y5BlWCr72gvc8eZAsYNmeI7Ts7BvXzwHaYHClNv+QBRDow==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cz7R1CtMqlzyfLJK4txfDSJ/WrPfu1uhcXFQ96pf4sRYrC2NB+u+L3OnQ5NUDBF562XQADojDSrhBXWH+b+/l1D8qSCwSShyqY9rOkqjv4S0TLX2Mh1LIY6ZaVULgJhdhaIlvqxVrC9TMstdCN2v69OVDoHjI+OVyLqWdKjwfZmKN+Cl3QlLudBzuBaDFnbnJup2e0gOhJFVN2dUxCxwZ25TeQpT3pcNaUClY/LDIqzYCdSLakvg1bsm+qk0Qy5BlpTbVqaojOyLFwDVWIQM20ZDQzUiZ2xL5+L+9EmIK/VC0kZnkR0VVVvJ/c3hogV91WOxZizPPK+c/yd0V1GofA==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Fri, 14 Jan 2022 12:51:35 +0000
  • Ironport-data: A9a23:SrmbQKngD/AhSNCLAINI3rDo5gwVIURdPkR7XQ2eYbSJt1+Wr1Gzt xIdDGCGafrYMGT2e94nPI3noU0O6pXcyoJlGQE4rHpmRSMWpZLJC+rCIxarNUt+DCFioGGLT Sk6QoOdRCzhZiaE/n9BClVlxJVF/fngqoDUUYYoAQgsA180IMsdoUg7wbRh29c02YLR7z6l4 rseneWOYDdJ5BYsWo4kw/rrRMRH5amaVJsw5zTSVNgT1LPsvyB94KE3fMldG0DQUIhMdtNWc s6YpF2PEsE1yD92Yj+tuu6TnkTn2dc+NyDW4pZdc/DKbhSvOkXee0v0XRYRQR4/ttmHozx+4 P5ji6S9QCEJBZ3FqPsAbztjMCVcI5QTrdcrIVDn2SCS50jPcn+qyPRyFkAme4Yf/46bA0kXq 6ZecmpUKEne2aTmm9pXScE17ignBNPsM44F/Glp0BnSDOo8QICFSKLPjTNd9Glp15oSR6mFD yYfQRxzcy3BMjBUBmw0S4wkp9WPuFLWWSIN/Tp5ooJoujOOnWSdyoPFMsfTPNqDRsxXn0ORj mPA42n9RBodMbS3yjeb83Tqmu7Gmwv6Xp4fEPuz8fsCqEKX7nweDlsRT1TTiem0jAuyVsxSL 2QQ+zEytu4i+UqzVN7/Uhak5nmesXY0efBdDuk74wGl0bfP7kCSAW1sc9JaQIV47olsH2Vsj wLX2YOybdByjFGLYVmzyI+ZpBWSAhgcHXAfWQ0/bg8n/eC29enfkSnzZtpkFae0iPj8Fjfx3 y2GoUACulkDsSIY//7lpA6a2lpAsrCMF1dovVuPAgpJ+yskPNbNWmC+1bTMAR+sxq69R0LJg nULktP2AAsmXcDUz3zlrAng8diUCxe53N/03Q8H83oJrW3FF5ufkWZ4umEWyKBBaJdsRNMRS BWP0T69HbcKVJdQUYd5YpiqF+MhxrX6GNLuW5j8N4QSOMAgJVLfpns0PyZ8OlwBdmB2wcnT3 r/BIK6R4YsyU/w7nFJauc9AuVPU+szO7TyKHs2qp/hW+bGfeGSUWd843KimNYgEAFe/iFyNq b53bpLSoz0GCbGWSnSJreY7cA5bRVBmVcGeg5EGLYarf1s5cFzN/teMm9vNjaQ/wfQM/goJl 1ngMnJlJK3X3iyYeV7UOyE6ONsCn/9X9BoGAMDlBn7xs1ALaoez9qYPMZwxeLgs7ut4yvBoC fICfq297j5nEFwrIhwRMsvwqpJMbhOuiV7cNiapemFnLZVhWxbI6pnveQ62rHsCCS++tM0fp by811yEHcpfFlo6VMuGOuiyy16RvGQGnL4gVUX/PdQOKl7n95JnKnKtg6Zvcd0MMxjK2hCTy x2SXUUDveDIroJsqIvJiKmIop2HCex7GkYGTWDX4azvbXvR/3a5wJ8GW+GNJGiPWGTx8aSkR ONU0/Cjb6FXwAcU69JxSu85w7g/6t3jo65h4j5lRHibPU62Dr5AI2Wd2ZUdvKN62bIE6xC9X ViC+4cGNOzRat/lClMYOCEscv+HiaMPgjDX4Pk4fBf66Stw8ObVWEleJUDR2ilULb8zO4I52 +Yx/sUR7lXn2BYtN9+HiAFS9niNcSNcA/l26MlCDd+5kBcvx3FDfYfYW33/75y4YtlRNlUnf 22Pj63YirUAnkfPfhLfz5QWMTaxUXjWhC138Q==
  • Ironport-hdrordr: A9a23:WgU/m62A5r36dTrj54MUcgqjBLYkLtp133Aq2lEZdPUzSL3+qy nOpoV+6faQsl0ssR4b9exoVJPufZq+z/5ICOsqU4tKNTOO0AHEEGgI1+rf6gylNyri9vNMkY dMGpIObeEY1GIK7voSNjPIceod/A==
  • Ironport-sdr: 5WVHpWQrrcWtCL8ePM1vbvAZY+Qx1UOrDvrhpNvV+KAej0kXzUr/jiJNjKNTe9XxehYmqdKinK hNxlEsDc/71GKhcz5CIhMfO8HP+cbZr0Yfj36tMvtHtCY0z5LKHtGHysxTThlX2C2I4Ca75lE1 DqrUeYTR+pC8Hv3tal18e+V9fLV3nav79+AwW12P7izW4Ifxw3E3UHE0y7yrg2GR0i0/C9PRrP dJDMFI03ZZortHovL35hHVDcbZXJwnxm1ThINpMiSjLV3d55i5EZPwF4h/Nb3vUjXzJLU2b6at iMpIWEze9GkoHcI6LWorMDVq
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jan 13, 2022 at 04:38:33PM +0000, Andrew Cooper wrote:
> The logic was based on a mistaken understanding of how NMI blocking on vmexit
> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
> and the guest's value will be clobbered before it is saved.
> 
> Switch to using MSR load/save lists.  This causes the guest value to be saved
> atomically with respect to NMIs/MCEs/etc.
> 
> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
> the same time as configuring the intercepts.  This function is always used in
> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
> function, rather than having multiple remote acquisitions of the same VMCS.
> 
> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
> -ESRCH but we don't matter, and this path will be taken during domain create
> on a system lacking IBRSB.
> 
> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
> rather than vcpu_msrs, and update the comment to describe the new state
> location.
> 
> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
> entirely, and use a shorter code sequence to simply reload Xen's setting from
> the top-of-stack block.
> 
> Because the guest values are loaded/saved atomically, we do not need to use
> the shadowing logic to cope with late NMIs/etc, so we can omit
> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
> no need to switch back to Xen's context on an early failure.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Wei Liu <wl@xxxxxxx>
> CC: Jun Nakajima <jun.nakajima@xxxxxxxxx>
> CC: Kevin Tian <kevin.tian@xxxxxxxxx>
> 
> Needs backporting as far as people can tolerate.
> 
> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
> but this is awkard to arrange in asm.
> ---
>  xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
>  xen/arch/x86/hvm/vmx/vmx.c               | 36 
> +++++++++++++++++++++++++++-----
>  xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
>  xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
>  4 files changed, 56 insertions(+), 41 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
> index 30139ae58e9d..297ed8685126 100644
> --- a/xen/arch/x86/hvm/vmx/entry.S
> +++ b/xen/arch/x86/hvm/vmx/entry.S
> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
>  
>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: 
> acd */
>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
> +
> +        .macro restore_spec_ctrl
> +            mov    $MSR_SPEC_CTRL, %ecx
> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
> +            xor    %edx, %edx
> +            wrmsr
> +        .endm
> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM

I assume loading the host value into SPEC_CTRL strictly needs to
happen after the RSB overwrite, or else you could use a VM exit host
load MSR entry to handle MSR_SPEC_CTRL.

Also I don't understand why SPEC_CTRL shadowing is not cleared before
loading Xen's value now.

>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if 
> debugging Xen. */
> @@ -83,7 +90,6 @@ UNLIKELY_END(realmode)
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>          /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: 
> cd */
> -        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
>          ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), 
> X86_FEATURE_SC_VERW_HVM
>  
>          mov  VCPU_hvm_guest_cr2(%rbx),%rax
> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>          SAVE_ALL
>  
>          /*
> -         * PV variant needed here as no guest code has executed (so
> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
> -         * to hit (in which case the HVM variant might corrupt things).
> +         * SPEC_CTRL_ENTRY notes
> +         *
> +         * If we end up here, no guest code has executed.  We still have 
> Xen's
> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>           */
> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          call vmx_vmentry_failure
>          jmp  .Lvmx_process_softirqs
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 28ee6393f11e..d7feb5f9c455 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -592,6 +592,7 @@ void vmx_update_exception_bitmap(struct vcpu *v)
>  static void vmx_cpuid_policy_changed(struct vcpu *v)
>  {
>      const struct cpuid_policy *cp = v->domain->arch.cpuid;
> +    int rc = 0;
>  
>      if ( opt_hvm_fep ||
>           (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
> @@ -601,17 +602,27 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>  
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> -    vmx_vmcs_exit(v);
>  
>      /*
>       * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
>       * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
>       */
>      if ( cp->feat.ibrsb )
> +    {
>          vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
> +
> +        rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0);
> +        if ( rc )
> +            goto err;
> +    }
>      else
> +    {
>          vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
>  
> +        /* Can only fail with -ESRCH, and we don't care. */
> +        vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);
> +    }
> +
>      /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */
>      if ( cp->feat.ibrsb || cp->extd.ibpb )
>          vmx_clear_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
> @@ -623,6 +634,15 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>          vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
>      else
>          vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
> +
> + err:

Nit: I would name this out, since it's used by both the error and the
normal exit paths, but that's just my taste.

> +    vmx_vmcs_exit(v);
> +
> +    if ( rc )
> +    {
> +        printk(XENLOG_G_ERR "MSR load/save list error: %d", rc);
> +        domain_crash(v->domain);
> +    }
>  }
>  
>  int vmx_guest_x86_mode(struct vcpu *v)
> @@ -3128,7 +3148,6 @@ static int is_last_branch_msr(u32 ecx)
>  static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>  {
>      struct vcpu *curr = current;
> -    const struct vcpu_msrs *msrs = curr->arch.msrs;
>      uint64_t tmp;
>  
>      HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
> @@ -3136,7 +3155,11 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t *msr_content)
>      switch ( msr )
>      {
>      case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
> -        *msr_content = msrs->spec_ctrl.raw;
> +        if ( vmx_read_guest_msr(curr, msr, msr_content) )
> +        {
> +            ASSERT_UNREACHABLE();
> +            goto gp_fault;
> +        }

AFAICT this is required just for Xen internal callers, as a guest
attempt to access MSR_SPEC_CTRL will never be trapped, or if trapped it
would be because !cp->feat.ibrsb and thus won't get here anyway.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.