[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 3 Jun 2022 16:46:16 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=K2rdynz9N7r99RF4nuN3SzI28v2vuLNHK2WstN8gwRY=; b=iQAoqb5QGBl11R9oPsfthhv7MfQCRZjtwdAKjLACmS1OvD+DAOI+1mXQzfuaKRVB9f1HWGu9CK7g/56a+yIBrOW2yFuwBD7zJNGdzcEWZLztnl54p8sQhQZUGZRbTwqV+B44yLbzWkiLBbohk7ZXHhjrqw0I+9GbObCIvjQNMwuCgWiBFlIPgfDyCksohOGy9UG1xfu8YJBSPHIL2UeOkACoy7JXQo9ocrJ2zIM/b0Ql9AEbs4koXVIMZiTEWunHRFpvxOk8dlIysKirIdLzWMwYwP11yrJu+T/iqXmbzKI994UJr7uuohAdr9FObSlNY99R3CK6DMPKYKYTsl4SHw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=R35LYOY3EV6uFymOmqCyCMB6/u0BAJOHr6VHLuZA5Vad+BBhkibn5DshnJjrBbDLM0GXp/lUalpPMzuCAD+xXr3ydIwvThsB8rPz0VfBF18ZKcNa/qNBBXmD9AiOor2sSqvaDaeYs+zucZEn79RkjUtIXoiXaejUwnGggTNsmtq3uGypJmeXP6kyTcMbiHygSQ1V8cnoZWhGG59PyDKuckBpkwe4Cmiy0LuuP46P6xXH/HNz6zHGEF3NWNpHBmFG4jiSbOsfEBjHg0vLVcoBgL6PHopYAGh9OZUpNFaXwNoVjTb4DWLmq3dvNR6kiHJj39hXiIdZbIsVdq0/tnuZTw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 03 Jun 2022 14:46:31 +0000
  • Ironport-data: A9a23:M6nZu6AkrPXjOBVW/+riw5YqxClBgxIJ4kV8jS/XYbTApDsj0TYHz DcXWmuPbquLMGWmft9+bIq2/BgEvMTQndVgQQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMZiaA4E/raNANlFEkvU2ybuOU5NXsZ2YgHGeIdA970Ug5w7Bh2dYy6TSEK1jlV e3a8pW31GCNg1aYAkpMg05UgEoy1BhakGpwUm0WPZinjneH/5UmJMt3yZWKB2n5WuFp8tuSH I4v+l0bElTxpH/BAvv9+lryn9ZjrrT6ZWBigVIOM0Sub4QrSoXfHc/XOdJFAXq7hQllkPhM4 89/nqa0VzwEL7/hhtQgA0FfHAthaPguFL/veRBTsOS15mifKT7J/K8rC0s7e4oF5uxwHGdCs +QCLywAZQyCgOTwx6+nTu5rhYIoK8yD0IE34yk8i22GS6h4B8yeK0nJzYYwMDMYnMdBEOyYf 8MEQTFucA7Bc1tEPVJ/5JcWw7zy3yOlKWcwRFS9+Yom3TjU8A9K0KG9O9P5VfGJHeF2gRPNz o7B1yGjav0AD/SdwzeY9nOnhsfUgDj2HokVEdWQ9fN0gVvV2m0aDjUXU0e2pb+yjUvWc9BCL QoS8yknr6k3/WSqSMXwW1uzp3vslhwBX9tdFcUq5QfLzbDbiy6CHXQNRDNFbN0gtec1SCYs2 1vPmMnmbRRwtJWFRHTb8a2bxRuiNC5QIWIcaCssSQoe/8KlsIw1lgjITNtoDOiylNKdJN3r6 zWDrSx7gqpJi8cOjv+/5Qqf32/qoYXVRAko4AmRRnii8g5yeI+iYcqv9ETf6vFDao2eSzFto UQ5piRX18hWZbnlqcBHaLxl8G2BjxpdDADhvA==
  • Ironport-hdrordr: A9a23:2l9iSKGxazE7x+yqpLqFYZHXdLJyesId70hD6qkvc3Fom52j/f xGws5x6faVslkssb8b6LW90Y27MAvhHPlOkPIs1NaZLXDbUQ6TQL2KgrGD/9SNIVycygcZ79 YbT0EcMqyOMbEZt7ec3ODQKb9Jrri6GeKT9IHjJh9WPH1XgspbnmNE42igYy9LrF4sP+tFKH PQ3LsOm9LmEk5nHfiTNz0gZazuttfLnJXpbVovAAMm0hCHiXeN5KThGxaV8x8CW3cXqI1Su1 Ttokjc3OGOovu7whjT2yv66IlXosLozp9mCNaXgsYYBz3wgkKDZZhnWZeFoDcpydvfo2oCoZ 3pmVMNLs5z43TeciWcpgbs4RDp1HIU53rr2Taj8AzeiP28YAh/J9tKhIpffBecwVEnpstA3K VC2H/cn4ZLDDvb9R6NqeTgZlVPrA6ZsHAimekcgzh0So0FcoJcqoQZ4Qd8DIoAJiTn84oqed MeQ/003MwmMW9yUkqp/VWGmLeXLzYO91a9MwQ/U/WuonlrdCsT9Tpc+CQd9k1wgK7VBaM0o9 gsCZ4Y5Y2mfvVmE56VO91xMfdfKla9Ny4kY1jiaGgOKsk8SgDwgq+yxokJz8eXX7FN5KcOuf 36ISZlXCgJCg/TNfE=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jun 03, 2022 at 02:49:54PM +0200, Jan Beulich wrote:
> On 26.05.2022 13:11, Roger Pau Monne wrote:
> > Under certain conditions guests can get the CPU stuck in an unbounded
> > loop without the possibility of an interrupt window to occur on
> > instruction boundary.  This was the case with the scenarios described
> > in XSA-156.
> > 
> > Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> > if no interrupt window occurs for a specified amount of time.  Note
> > that using the Notify VM Exit avoids having to trap #AC and #DB
> > exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> > puts the CPU in a loop without an interrupt window, as such disable
> > the intercepts if the feature is available and enabled.
> > 
> > Setting the notify VM exit window to 0 is safe because there's a
> > threshold added by the hardware in order to have a sane window value.
> > 
> > Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v1:
> >  - Properly update debug state when using notify VM exit.
> >  - Reword commit message.
> > ---
> > This change enables the notify VM exit by default, KVM however doesn't
> > seem to enable it by default, and there's the following note in the
> > commit message:
> > 
> > "- There's a possibility, however small, that a notify VM exit happens
> >    with VM_CONTEXT_INVALID set in exit qualification. In this case, the
> >    vcpu can no longer run. To avoid killing a well-behaved guest, set
> >    notify window as -1 to disable this feature by default."
> > 
> > It's not obviously clear to me whether the comment was meant to be:
> > "There's a possibility, however small, that a notify VM exit _wrongly_
> > happens with VM_CONTEXT_INVALID".
> > 
> > It's also not clear whether such wrong hardware behavior only affects
> > a specific set of hardware, in a way that we could avoid enabling
> > notify VM exit there.
> > 
> > There's a discussion in one of the Linux patches that 128K might be
> > the safer value in order to prevent false positives, but I have no
> > formal confirmation about this.  Maybe our Intel maintainers can
> > provide some more feedback on a suitable notify VM exit window
> > value.
> 
> This certainly wants sorting one way or another before I, for one,
> would consider giving an R-b here.

I was hoping for either Kevin or Jun (now moved from Cc to To) to
provide some guidance here on what a suitable default value would be.

> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
> >  static unsigned int __read_mostly ple_window = 4096;
> >  integer_param("ple_window", ple_window);
> >  
> > +static unsigned int __ro_after_init vm_notify_window;
> > +integer_param("vm-notify-window", vm_notify_window);
> 
> Could even be a runtime param, I guess. Albeit I realize this would
> complicate things further down.
> 
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct 
> > vcpu *v)
> >  
> >  void vmx_update_debug_state(struct vcpu *v)
> >  {
> > +    unsigned int mask = 1u << TRAP_int3;
> > +
> > +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> 
> I'm puzzled by the lack of symmetry between this and ...
> 
> > +        /*
> > +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> > +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> > +         */
> > +        mask |= 1u << TRAP_debug;
> > +
> >      if ( v->arch.hvm.debug_state_latch )
> > -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> > +        v->arch.hvm.vmx.exception_bitmap |= mask;
> >      else
> > -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> > +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
> >  
> >      vmx_vmcs_enter(v);
> >      vmx_update_exception_bitmap(v);
> > @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >          switch ( vector )
> >          {
> >          case TRAP_debug:
> > +            if ( cpu_has_monitor_trap_flag && 
> > cpu_has_vmx_notify_vm_exiting )
> > +                goto exit_and_crash;
> 
> ... this condition. Shouldn't one be the inverse of the other (and
> then it's the one down here which wants adjusting)?

The condition in vmx_update_debug_state() sets the mask so that
TRAP_debug will only be added or removed from the bitmap if
!cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting (note that
otherwise TRAP_debug is unconditionally set if
!cpu_has_vmx_notify_vm_exiting).

Hence it's impossible to get a VMExit TRAP_debug with
cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting because
TRAP_debug will never be set by vmx_update_debug_state() in that
case.

> 
> > @@ -126,5 +126,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from 
> > realmode")
> >  PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
> >  
> >  PERFCOUNTER(buslock, "Bus Locks Detected")
> > +PERFCOUNTER(vmnotify_crash, "domains crashed by Notify VM Exit")
> 
> I think the text is not entirely correct and would want to be
> "domain crashes by ...". Multiple vCPU-s of a single domain can
> experience this in parallel (granted this would require "good"
> timing).

Sure, thanks for the review.

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.