[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit


  • To: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 9 Jun 2022 12:09:18 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2E4PxDigpKPE9tD7lv2bPpV+3LSY190CMvzZwrejMf0=; b=Zazln7noNmwiTlPNyPMAwr2kpfUaFZKZ5kj3z8FS3KDetuUtG2IqCpojReEd5Lta/DC9UEYe/BRsGvrkcDLY4RCXth6Dkz5EAKMQbTJUtMfQlWis6Zua/IL/vPJ2x7fVEBVZNGq3QeJRhSGrDbz0Ze8cgFeLBlEdxbQYmg3gd+fcK7gs+RWBUJQFKWEgsZQgLb2PPYwKlvaKCNVNFqi+h+eP1jn0aK8XgWF+COU63C//GbHuoFM8SN4TiPkmn8cTSVAtn3oSGmlrau0dIpdM7LOkWtXsuS23lIYN0fSUHJ0GhQymEKJ6cqu3Bt9mIGyn/Awc83/xh6kvgEj6UKyCWA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XFFkFQmrmyQJXm4ZSKGMQqXEOe//dvQacECnJgPoz+/R/+bnzbCKhWvvqTkVVfGrTOgUqihdPPP27ZGgHuc1EbOkflcRY7oEtqrSlLrZLEwjHo8HLf4HC5biWLnUnCsVD7t4irqSNrtnjQ6aV9XfZjc0XSTYj34laymUZBhLdX2wbsFLJAnGcid4DS0fkvGrncIBn8cdyhamF0h7Ejy0VlCxx35JcSnmTicf8tShy/Eehjq8LcMeccnWP//6zPIhsezwBWnTlxBTxYVVCy2WsBXiQQzF7utFHjOGQ7ucnmxZ3NagheexBAEpy/vy3JpAmu3Pv5Id9PcqEzSw/jHV4A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Cooper, Andrew" <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, "Beulich, Jan" <JBeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, "Qiang, Chenyi" <chenyi.qiang@xxxxxxxxx>
  • Delivery-date: Thu, 09 Jun 2022 10:09:46 +0000
  • Ironport-data: A9a23:r0Ch/KJpRQgoDLOrFE+RIZQlxSXFcZb7ZxGr2PjKsXjdYENS0TwGz jNJC2uBaK7bM2r1eo9wbYi3oUlVvpDcmtJkGQplqX01Q3x08seUXt7xwmUcns+xwm8vaGo9s q3yv/GZdJhcokf0/0vrav67xZVF/fngqoDUUYYoAQgsA149IMsdoUg7wbRh3NY42YPR7z6l4 rseneWOYDdJ5BYsWo4kw/rrRMRH5amaVJsw5zTSVNgT1LPsvyB94KE3fMldG0DQUIhMdtNWc s6YpF2PEsE1yD92Yj+tuu6TnkTn2dc+NyDW4pZdc/DKbhSvOkXee0v0XRYRQR4/ttmHozx+4 Md9v8GrdgoSB7PBwb8RUjtFKnFRIqITrdcrIVDn2SCS52vvViO2ht9IVQQxN4Be/ftrC2ZT8 /BeMCoKch2Im+OxxvS8V/VogcMgasLsOevzuFk5lW2fUalgHMmFH/uiCdxwhV/cguhUGvnTf YwBYCdHZxXceRxffFwQDfrSmc/33CShLmMI8zp5o4Ixs3b5zhxhwoTuF9n8QMenZYZ/gFSx8 zeuE2PRR0ty2Mak4SqE+3W9j+iJmSLTWYQOGbn+/flv6HWQy3ISDlsKVFK9ifi/lkO6HdlYL iQ86ico6KQ/6kGvZt38RAGj5m6JuAYGXNhdGPF87xuCooL2yQuEAmkPThZadccr8sQxQFQC1 EKNnt7vLSxitvuSU3313qyPsTq4NCwRLGkDTSwJVw0I55/kuo5bpg3LZsZuFuiylNKdMTPtx XaMpSs3hbQWhOYK0bm2+RbMhDfEjpPJQwgk50POX2uj4St4YpKoY8qj7l2z0BpbBIOQT13Et n5dncGbtL8KFcvVyHLLR/gRFra04frDKCfbnVNkA5gm8XKq5mKneodTpjp5IS+FL/o5RNMgW 2eL0Ss52XOZFCL2BUOrS+pd0/gX8JU=
  • Ironport-hdrordr: A9a23:u4iQ5634y+sl/Ie35II3KAqjBTtyeYIsimQD101hICG9Lfb0qy n+pp4mPEHP4wr5OEtOpTlPAtjkfZr5z+8M3WB3B8bYYOCGghrQEGgG1+ffKlLbexEWmtQttp uINpIOcuEYbmIK8voSgjPIdOrIqePvmM7IuQ6d9QYKcegDUdAd0+4TMHf+LqQZfnglOXJvf6 Dsm/av6gDQMUj+Ka+Adwo4dtmGg+eOuIPtYBYACRJiwA6SjQmw4Lq/NxSDxB8RXx5G3L9nqA H+4kbEz5Tml8v+5g7X1mfV4ZgTsNz9yuFbDMjJrsQOMD3jhiuheYwkcbyfuzIepv2p9T8R4Z LxiiZlG/42x2Laf2mzrxeo8w780Aw243un8lOciWuLm72PeBsKT+56wa5JeBrQ7EQt+Ptm1r hQ4m6fv51LSTvdgSXU/bHzJl9Xv3vxhUBnvf8YjnRZX4dbQqRWt5Yj8ERcF4pFND7m6bogDP JlAKjnlblrmGuhHjDkV1RUsZ+RtixZJGbFfqFCgL3Y79FupgE586NCr/Zv20vp9/oGOu15Dq r/Q+BVfYp1P74rhJJGdZk8qPSMexzwqDL3QRSvyAfcZeg600ykke+E3JwFoMeXRbcv8Lwe3L z8bXIwjx9GR6upM7zC4KF2
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jun 09, 2022 at 03:39:33PM +0800, Xiaoyao Li wrote:
> On 6/9/2022 3:04 PM, Tian, Kevin wrote:
> > +Chenyi/Xiaoyao who worked on the KVM support. Presumably
> > similar opens have been discussed in KVM hence they have the
> > right background to comment here.
> > 
> > > From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> > > Sent: Thursday, May 26, 2022 7:12 PM
> > > 
> > > Under certain conditions guests can get the CPU stuck in an unbounded
> > > loop without the possibility of an interrupt window to occur on
> > > instruction boundary.  This was the case with the scenarios described
> > > in XSA-156.
> > > 
> > > Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> > > if no interrupt window occurs for a specified amount of time.  Note
> > > that using the Notify VM Exit avoids having to trap #AC and #DB
> > > exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> > > puts the CPU in a loop without an interrupt window, as such disable
> > > the intercepts if the feature is available and enabled.
> > > 
> > > Setting the notify VM exit window to 0 is safe because there's a
> > > threshold added by the hardware in order to have a sane window value.
> > > 
> > > Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > ---
> > > Changes since v1:
> > >   - Properly update debug state when using notify VM exit.
> > >   - Reword commit message.
> > > ---
> > > This change enables the notify VM exit by default, KVM however doesn't
> > > seem to enable it by default, and there's the following note in the
> > > commit message:
> > > 
> > > "- There's a possibility, however small, that a notify VM exit happens
> > >     with VM_CONTEXT_INVALID set in exit qualification. In this case, the
> > >     vcpu can no longer run. To avoid killing a well-behaved guest, set
> > >     notify window as -1 to disable this feature by default."
> > > 
> > > It's not obviously clear to me whether the comment was meant to be:
> > > "There's a possibility, however small, that a notify VM exit _wrongly_
> > > happens with VM_CONTEXT_INVALID".
> > > 
> > > It's also not clear whether such wrong hardware behavior only affects
> > > a specific set of hardware,
> 
> I'm not sure what you mean for a specific set of hardware.
> 
> We make it default off in KVM just in case that future silicon wrongly sets
> VM_CONTEXT_INVALID bit. Becuase we make the policy that VM cannot continue
> running in that case.
> 
> For the worst case, if some future silicon happens to have this kind silly
> bug, then the existing product kernel all suffer the possibility that their
> VM being killed due to the feature is default on.

That's IMO a weird policy.  If there's such behavior in any hardware
platform I would assume Intel would issue an errata, and then we would
just avoid using the feature on affected hardware (like we do with
other hardware features when they have erratas).

If we applied the same logic to all new Intel features we won't use
any of them.  At least in Xen there are already combinations of vmexit
conditions that will lead to the guest being killed.

> > > in a way that we could avoid enabling
> > > notify VM exit there.
> > > 
> > > There's a discussion in one of the Linux patches that 128K might be
> > > the safer value in order to prevent false positives, but I have no
> > > formal confirmation about this.  Maybe our Intel maintainers can
> > > provide some more feedback on a suitable notify VM exit window
> > > value.
> 
> The 128k is the internal threshold for SPR silicon. The internal threshold
> is tuned by Intel for each silicon, to make sure it's big enough to avoid
> false positive even when user set vmcs.notify_window to 0.
> 
> However, it varies for different processor generations.
> 
> What is the suitable value is hard to say, it depends on how soon does VMM
> want to intercept the VM. Anyway, Intel ensures that even value 0 is safe.

Ideally we need a fixed default value that's guaranteed to work on all
possible hardware that supports the feature, or alternatively a way to
calculate a sane default window based on the hardware platform.

Could we get some wording added to the ISE regarding 0 being a
suitable default value to use because hardware will add a threshold
internally to make the value safe?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.