[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit


  • To: Pau Monné, Roger <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Thu, 9 Jun 2022 07:04:12 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1PXseCSgxzJ24PN17SlXkaebzEixJqISFenoo1/Y6l8=; b=YrQ5plWAn+OWhOfXvR7CShWCei05L6B00JWxPmlUlIBEkL+2J3LJJ79KJtWu0DSpGy22wSF1YOmxejcatEwvKlXQD8MxC5C51Vht09DqW2heUxiAqyYAl4WcXsPYBegoLcIOR4ka4CYuVFPv3BjquurDzUP5yX+HFDo+vB+giwa+hZ43fQUkZLrnjJc0fRLfnL2HJzH+gEuXeI5vKxgbX1xJTm5JhpGceuQgxhwLke+5hJwlAHjIxlH/gIM9qkAqzk7W77HH20WvoZafJiW+6Eg3ZXq4TKAUkKBK8+Dm6oUAuBBIkxWWA8YD0/7RIZTx6XOmZay6U2SofzTvAORhcQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VndCciwLgJiELCMUq/BfZ2MGPuBbKeF9EdxbmQg4zPR1bWHqOjDfQkgdWSwW4HSud4yYn1PtnE6MeIQEGU+7kFLc/5BLvE+K7e3R8pYCmd7ait+7b7xc3zNRPd+4ipO6D05dlemmIiAMoVgzEtUMur5/l4s0n5wRUj6aLFE0CGT1qCn0IAE80+iQxRB+bJLExk+KKdvxw/ZRjAb3GA3cENBgPziRtukVnMOP2OSqqxJYxH/Z+4hnCDDytdeZEfY2BUF8J9oh5YMMQZbFZDQtEqT0EfzCwU1VgEgvRyLMc57LHPzjsKCsFS6nu2cHmSdP/iUyrYwIVNiXv4Xz3nTRsQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com;
  • Cc: Pau Monné, Roger <roger.pau@xxxxxxxxxx>, "Cooper, Andrew" <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, "Beulich, Jan" <JBeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, "Qiang, Chenyi" <chenyi.qiang@xxxxxxxxx>, "Li, Xiaoyao" <xiaoyao.li@xxxxxxxxx>
  • Delivery-date: Thu, 09 Jun 2022 07:04:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYcPGLnpUujBwR5EWVeB9tTYBUAq1Gu3mA
  • Thread-topic: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit

+Chenyi/Xiaoyao who worked on the KVM support. Presumably
similar opens have been discussed in KVM hence they have the
right background to comment here.

> From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> Sent: Thursday, May 26, 2022 7:12 PM
> 
> Under certain conditions guests can get the CPU stuck in an unbounded
> loop without the possibility of an interrupt window to occur on
> instruction boundary.  This was the case with the scenarios described
> in XSA-156.
> 
> Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> if no interrupt window occurs for a specified amount of time.  Note
> that using the Notify VM Exit avoids having to trap #AC and #DB
> exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> puts the CPU in a loop without an interrupt window, as such disable
> the intercepts if the feature is available and enabled.
> 
> Setting the notify VM exit window to 0 is safe because there's a
> threshold added by the hardware in order to have a sane window value.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Changes since v1:
>  - Properly update debug state when using notify VM exit.
>  - Reword commit message.
> ---
> This change enables the notify VM exit by default, KVM however doesn't
> seem to enable it by default, and there's the following note in the
> commit message:
> 
> "- There's a possibility, however small, that a notify VM exit happens
>    with VM_CONTEXT_INVALID set in exit qualification. In this case, the
>    vcpu can no longer run. To avoid killing a well-behaved guest, set
>    notify window as -1 to disable this feature by default."
> 
> It's not obviously clear to me whether the comment was meant to be:
> "There's a possibility, however small, that a notify VM exit _wrongly_
> happens with VM_CONTEXT_INVALID".
> 
> It's also not clear whether such wrong hardware behavior only affects
> a specific set of hardware, in a way that we could avoid enabling
> notify VM exit there.
> 
> There's a discussion in one of the Linux patches that 128K might be
> the safer value in order to prevent false positives, but I have no
> formal confirmation about this.  Maybe our Intel maintainers can
> provide some more feedback on a suitable notify VM exit window
> value.
> 
> I've tested with 0 (the proposed default in the patch) and I don't
> seem to be able to trigger notify VM exits under normal guest
> operation.  Note that even in that case the guest won't be destroyed
> unless the context is corrupt.
> ---
>  docs/misc/xen-command-line.pandoc       | 11 +++++++++
>  xen/arch/x86/hvm/vmx/vmcs.c             | 19 +++++++++++++++
>  xen/arch/x86/hvm/vmx/vmx.c              | 32 +++++++++++++++++++++++--
>  xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  4 ++++
>  xen/arch/x86/include/asm/hvm/vmx/vmx.h  |  6 +++++
>  xen/arch/x86/include/asm/perfc_defn.h   |  3 ++-
>  6 files changed, 72 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-
> command-line.pandoc
> index 1dc7e1ca07..ccf8bf5806 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2544,6 +2544,17 @@ guest will notify Xen that it has failed to acquire a
> spinlock.
>  <major>, <minor> and <build> must be integers. The values will be
>  encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.
> 
> +### vm-notify-window (Intel)
> +> `= <integer>`
> +
> +> Default: `0`
> +
> +Specify the value of the VM Notify window used to detect locked VMs. Set
> to -1
> +to disable the feature.  Value is in units of crystal clock cycles.
> +
> +Note the hardware might add a threshold to the provided value in order to
> make
> +it safe, and hence using 0 is fine.
> +
>  ### vpid (Intel)
>  > `= <boolean>`
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index d388e6729c..6cb2c6c6b7 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
>  static unsigned int __read_mostly ple_window = 4096;
>  integer_param("ple_window", ple_window);
> 
> +static unsigned int __ro_after_init vm_notify_window;
> +integer_param("vm-notify-window", vm_notify_window);
> +
>  static bool __read_mostly opt_ept_pml = true;
>  static s8 __read_mostly opt_ept_ad = -1;
>  int8_t __read_mostly opt_ept_exec_sp = -1;
> @@ -210,6 +213,7 @@ static void __init vmx_display_features(void)
>      P(cpu_has_vmx_pml, "Page Modification Logging");
>      P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
>      P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
> +    P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit");
>  #undef P
> 
>      if ( !printed )
> @@ -329,6 +333,8 @@ static int vmx_init_vmcs_config(bool bsp)
>              opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
>          if ( opt_ept_pml )
>              opt |= SECONDARY_EXEC_ENABLE_PML;
> +        if ( vm_notify_window != ~0u )
> +            opt |= SECONDARY_EXEC_NOTIFY_VM_EXITING;
> 
>          /*
>           * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
> @@ -1333,6 +1339,19 @@ static int construct_vmcs(struct vcpu *v)
>          rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
>                           VMX_MSR_GUEST_LOADONLY);
> 
> +    if ( cpu_has_vmx_notify_vm_exiting )
> +    {
> +        __vmwrite(NOTIFY_WINDOW, vm_notify_window);
> +        /*
> +         * Disable #AC and #DB interception: by using VM Notify Xen is
> +         * guaranteed to get a VM exit even if the guest manages to lock the
> +         * CPU.
> +         */
> +        v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
> +                                              (1U << TRAP_alignment_check));
> +        vmx_update_exception_bitmap(v);
> +    }
> +
>   out:
>      vmx_vmcs_exit(v);
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 69980c8e31..d3c1597b3e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct
> vcpu *v)
> 
>  void vmx_update_debug_state(struct vcpu *v)
>  {
> +    unsigned int mask = 1u << TRAP_int3;
> +
> +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> +        /*
> +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> +         */
> +        mask |= 1u << TRAP_debug;
> +
>      if ( v->arch.hvm.debug_state_latch )
> -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> +        v->arch.hvm.vmx.exception_bitmap |= mask;
>      else
> -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
> 
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs
> *regs)
>          switch ( vector )
>          {
>          case TRAP_debug:
> +            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> +                goto exit_and_crash;
> +
>              /*
>               * Updates DR6 where debugger can peek (See 3B 23.2.1,
>               * Table 23-1, "Exit Qualification for Debug Exceptions").
> @@ -4593,6 +4605,22 @@ void vmx_vmexit_handler(struct cpu_user_regs
> *regs)
>           */
>          break;
> 
> +    case EXIT_REASON_NOTIFY:
> +        __vmread(EXIT_QUALIFICATION, &exit_qualification);
> +
> +        if ( exit_qualification & NOTIFY_VM_CONTEXT_INVALID )
> +        {
> +            perfc_incr(vmnotify_crash);
> +            gprintk(XENLOG_ERR, "invalid VM context after notify vmexit\n");
> +            domain_crash(v->domain);
> +            break;
> +        }
> +
> +        if ( unlikely(exit_qualification &
> INTR_INFO_NMI_UNBLOCKED_BY_IRET) )
> +            undo_nmis_unblocked_by_iret();
> +
> +        break;
> +
>      case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
>      case EXIT_REASON_INVPCID:
>      /* fall through */
> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> index 5d3edc1642..0961eabf3f 100644
> --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> @@ -267,6 +267,7 @@ extern u32 vmx_vmentry_control;
>  #define SECONDARY_EXEC_XSAVES                   0x00100000
>  #define SECONDARY_EXEC_TSC_SCALING              0x02000000
>  #define SECONDARY_EXEC_BUS_LOCK_DETECTION       0x40000000
> +#define SECONDARY_EXEC_NOTIFY_VM_EXITING        0x80000000
>  extern u32 vmx_secondary_exec_control;
> 
>  #define VMX_EPT_EXEC_ONLY_SUPPORTED                         0x00000001
> @@ -348,6 +349,8 @@ extern u64 vmx_ept_vpid_cap;
>      (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
>  #define cpu_has_vmx_bus_lock_detection \
>      (vmx_secondary_exec_control &
> SECONDARY_EXEC_BUS_LOCK_DETECTION)
> +#define cpu_has_vmx_notify_vm_exiting \
> +    (vmx_secondary_exec_control &
> SECONDARY_EXEC_NOTIFY_VM_EXITING)
> 
>  #define VMCS_RID_TYPE_MASK              0x80000000
> 
> @@ -455,6 +458,7 @@ enum vmcs_field {
>      SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
>      PLE_GAP                         = 0x00004020,
>      PLE_WINDOW                      = 0x00004022,
> +    NOTIFY_WINDOW                   = 0x00004024,
>      VM_INSTRUCTION_ERROR            = 0x00004400,
>      VM_EXIT_REASON                  = 0x00004402,
>      VM_EXIT_INTR_INFO               = 0x00004404,
> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> index bc0caad6fb..e429de8541 100644
> --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> @@ -221,6 +221,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
>  #define EXIT_REASON_XSAVES              63
>  #define EXIT_REASON_XRSTORS             64
>  #define EXIT_REASON_BUS_LOCK            74
> +#define EXIT_REASON_NOTIFY              75
>  /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
> 
>  /*
> @@ -236,6 +237,11 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
>  #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
>  #define INTR_INFO_RESVD_BITS_MASK       0x7ffff000
> 
> +/*
> + * Exit Qualifications for NOTIFY VM EXIT
> + */
> +#define NOTIFY_VM_CONTEXT_INVALID       1u
> +
>  /*
>   * Exit Qualifications for MOV for Control Register Access
>   */
> diff --git a/xen/arch/x86/include/asm/perfc_defn.h
> b/xen/arch/x86/include/asm/perfc_defn.h
> index d6eb661940..c6b601b729 100644
> --- a/xen/arch/x86/include/asm/perfc_defn.h
> +++ b/xen/arch/x86/include/asm/perfc_defn.h
> @@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions,           "exceptions", 32)
> 
>  #ifdef CONFIG_HVM
> 
> -#define VMX_PERF_EXIT_REASON_SIZE 75
> +#define VMX_PERF_EXIT_REASON_SIZE 76
>  #define VMEXIT_NPF_PERFC 143
>  #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
>  PERFCOUNTER_ARRAY(vmexits,              "vmexits",
> @@ -126,5 +126,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from
> realmode")
>  PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
> 
>  PERFCOUNTER(buslock, "Bus Locks Detected")
> +PERFCOUNTER(vmnotify_crash, "domains crashed by Notify VM Exit")
> 
>  /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
> --
> 2.36.0


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.