[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 12 Mar 2021 10:08:52 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QDW0PCDjN+tn7CYPrCPUjvIRmk30V9xvRTya61+pBzM=; b=hXLmCMokRyyaXnWtxsrAEGQSryZ4RaRSVlo91THYrA1f0BRAexSVEMrguVAXEJnOz1Mm11wug7pfyz4TuXSGlVL12mWY2h9ICiG9RC27XZeriXWop/lvl6beEbjTF+CNIVihJnC53+0K5GGX8FVHV3GHB5Pl0CMGH/Y/vEsJ+PLtlnkxpvi2V7UP3O8+HairSAWqfgMlUWhPSj+LE7hwN8GHT37j8d1BNZrGg8QgYgYHnfz4kjTIthRP6pQ4bMHhf/iWu7FCyKDlCjqbpMCbphU3uZNPOgEu+QnUDEWfUdmz6jaV/as1otY5mRxFsyh0SwYaRaU2+bGuBJtP4Ry73w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fFwILX97UcbEOKdBMCdzDBsNl7mMvRtI9vkrv+SCvVp+ml9cfATbcuW/WEGums8gFcLn1RLDm185q5A24x9/YSifWJDa5qd8emuYufwJc/kXviZIVKhnF0xHrsPapspJXlWzWgI2nm67dowHH+Z3BSlILcZgkf8ZKdq5BFpYpOxxb5FZrV5Dls8/o+uzVmZSDBSWLTWT4kSW8XvTpXbq6H0RoJw1UcUJA9dYM+gtUOZHlPL0xhPcVTdVuCHDV8LwP+Nhlf3SL5vfegut2iBZv6SZZ+w7Ayv93VvOpa3dwF66VCtLSbyZrMBQNDv5mlL6tz96Wku6FNnYWYm4mhy5Xg==
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Fri, 12 Mar 2021 09:09:14 +0000
  • Ironport-hdrordr: A9a23:c6nJdag2CVO13SlUUTX3BhDZQnBQX3tw3DAbvn1ZSRFFG/Gwv/ uF2NwGyB75jysQUnk8mdaGfJKNW2/Y6IQd2+csFJ+Ydk3DtHGzJI9vqbHjzTrpBjHk+odmup tIW5NVTOf9BV0St6rHySGzGdo43Z2j+Kenme/Rwx5WPH1XQotLhj0JbjqzOEtwWQVAGN4dHJ 2T+sJIq1ObCAsqR+68AWQIWPWGms3TmPvdEFA7LjMEyC3LtzOn77bmDwOVty1+bxpjyaovmF K16zDRyb6kt5iAu33h/k/Vq69bgd7wjuZEbfb89/Q9DhXJpkKWaJ96W7uE1QpF4d2HzFoxit HDr1MBEq1ImgjsV1q4qxfsxAXsuQxGgxSJpD/o4kfLmsD3SCk3DMBMn+tiA2HkwnEtoc1m1+ Zz13+Z3qAnVi/opjj35NTDSnhR5y2JiEciiuIagjh+VoYTedZq3PUi1X5VC5sJEWbG7pkmGo BVfafhzctRGGnqCkzxjy1K+piBT34zFhCJTgwpocqOyQVbm3h/0g8x2NEftm1ozuN8d7B0o8 D/doh4nrBHScEbKYhnAv0afMexAmvRBTrRLWOpJ0j9Hq1vAQOPl7fHpJEOoM26cp0By5U/3L 7bVklDiGI0c0XyTeqDwYNM6RKIZGmmRzzixoV/6vFCy//BbYuuFRfGZEElksOmrflaKNbcQe yPNJVfBOKmInDpHYpPwg3iS5hfIXQTS6Quy5IGcmPLhviOBpzht+TdfvqWDqHqCywYVmT2BW ZGXDWbHrQG0mmbHlvDxDTBUXLkfULyubhqFrLBwuQVwI8RcolFsg0fj0Wl9tiGQAcy9JAeTQ 9bGvfKg6m7rW658SLj9GNyICdQCU5T/fHnSHNFpQgDNkvuarYds9CDeWRftUH3YyNXfofzKk pytl538aW4I9i73iY5Ee+qNWqckj8Ovn6QVowdnaeC/M/hfZs9Av8dKfVMPDSOMyYwtRdhqW 9FZgNBe1TWESn2j765yLYOAvvEStV6iAC3AMJdpH7Fr3+ArcU3SnZzZU/3beenxSIVAxtdnB lY7rIWirvoo0fQFUIPxMADdGBqREvSKrRcFwiBbJhTgdnQCXBNZFbPoyebhRE1cnft7GMIiA XaXHepUPnWH1tQvW1Z2K728FVyMn6QZV50d2oSi/wPKU3DoHZr5+qCbaa3yQKqGys/6/BYPz ffbTQIJARyg9ixyR6OgT6HUW4r35M0I4XmffgeWqCW3nOmM4uTk64aW/dS4ZZ+Ldjr29V7G9 63akuQLDniDfku1BHQrnE5ODNsoH1hlf/zwhXq4Cy523E4aMCibWhOVvUeI9uG6XLjSOvN2J Jljcgtte/1K37vcLe9uOjqRi8GLgmWrX+9Tukup5wRtaUutKFrF52eVTfTznlI0Bg3Mc+crj JQfI1rpLTafoN/dc0bfCxUukAkk9mCN0MnuA37CO1WRyBns1bLe9eSp7bYo7smBUOM4BbqMV 6E6itH4rPLWTCA2bNyMdNHHU1GLEwnrHJs8+OJe9eOVEGkd+Rf8EG7NXH4erlHU6SBEagRqB E/49zgpZ7iSwPonATL+T18KeZS9mziR8W4CgeFA/RJ/Ny3Ik7kuNrj3OejyDPsDSKmYEEZj5 BffUMebs5fmiAv5bdHpxSaW+jyuAY5iFNQ7jFsi0711oWn6GndG1taMQexuOQgYRBDdn6Sjc rE9uCE1HPypDhdsKOzZnttQg==
  • Ironport-sdr: swwvuahnNB5VfZDgcfvlqvSCCpLJuuNWK+GzZhDBfOmdApXllrMh9wVTZ/CIibnXOsqAqnspP2 1BXRNZA1caGgnRIhrUDO7Z3wC6q+uedpJBY4K/ZPeOm+zTpkT0YaI90WSuQiTbt65uaRQYktPP 4DXbF0PGCcmd5xEeFLc4imlnl6x7OS7rKDLnFdJbQb/LxUiXEEHcBbvRUPhf5QTLtwvLNNgBRV Fn9cHVgvSpodDKSkAaOk5chq1YjWcogrEVOKVvxNuigntePEEaObXdh4aIfRJLXk9t1ugBCikv /SQ=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Mar 12, 2021 at 08:54:46AM +0100, Jan Beulich wrote:
> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> handler early enough to cover for example the rdmsrl_safe() of
> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> of MSR_K7_HWCR later in the same function). The respective change
> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> backported to 4.14, but no further - presumably since it wasn't really
> easy because of other dependencies.
> 
> Therefore, to prevent our change in the handling of guest MSR accesses
> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> the raising of #GP on this paths conditional upon the guest having
> installed a handler, provided of course the MSR can be read in the first
> place (we would have raised #GP in that case even before). Producing
> zero for reads isn't necessarily correct and may trip code trying to
> detect presence of MSRs early, but since such detection logic won't work
> without a #GP handler anyway, this ought to be a fair workaround.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> Release-Acked-by: Ian Jackson <iwj@xxxxxxxxxxxxxx>

I think the approach is fine:

Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Some comments below.

> ---
> (projected v4: re-base over Roger's change)
> v3: Use temporary variable for probing. Document the behavior (in a
>     public header, for the lack of a better place).
> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>     messages (in debug builds). Don't alter WRMSR behavior.
> ---
> While I didn't myself observe or find similar WRMSR side issues, I'm
> nevertheless not convinced we can get away without also making the WRMSR
> path somewhat more permissive again, e.g. tolerating attempts to set
> bits which are already set. But of course this would require keeping in
> sync for which MSRs we "fake" reads, as then a kernel attempt to set a
> bit may also appear as an attempt to clear others (because of the zero
> value that we gave it for the read). Roger validly points out that
> making behavior dependent upon MSR values has its own downsides, so
> simply depending on MSR readability is another option (with, in turn,
> its own undesirable effects, e.g. for write-only MSRs).
> 
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -874,7 +874,8 @@ static int read_msr(unsigned int reg, ui
>      struct vcpu *curr = current;
>      const struct domain *currd = curr->domain;
>      const struct cpuid_policy *cp = currd->arch.cpuid;
> -    bool vpmu_msr = false;
> +    bool vpmu_msr = false, warn = false;
> +    uint64_t tmp;
>      int ret;
>  
>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> @@ -882,7 +883,7 @@ static int read_msr(unsigned int reg, ui
>          if ( ret == X86EMUL_EXCEPTION )
>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);

You might want to move the injection of the exception to the done
label?

So that we can avoid the call to x86_emul_reset_event.

>  
> -        return ret;
> +        goto done;
>      }
>  
>      switch ( reg )
> @@ -986,7 +987,7 @@ static int read_msr(unsigned int reg, ui
>          }
>          /* fall through */
>      default:
> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> +        warn = true;
>          break;
>  
>      normal:
> @@ -995,7 +996,19 @@ static int read_msr(unsigned int reg, ui
>          return X86EMUL_OKAY;
>      }
>  
> -    return X86EMUL_UNHANDLEABLE;
> + done:
> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address 
> &&
> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, tmp) )
> +    {
> +        gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
> +        *val = 0;
> +        x86_emul_reset_event(ctxt);
> +        ret = X86EMUL_OKAY;
> +    }
> +    else if ( warn )
> +        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);

I think you could add:

if ( rc == X86EMUL_EXCEPTION )
    x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);

> +
> +    return ret;
>  }
>  
>  static int write_msr(unsigned int reg, uint64_t val,
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -143,6 +143,12 @@ typedef unsigned long xen_ulong_t;
>   *  Level == 1: Kernel may enter
>   *  Level == 2: Kernel may enter
>   *  Level == 3: Everyone may enter
> + *
> + * Note: For compatibility with kernels not setting up exception handlers
> + *       early enough, Xen will avoid trying to inject #GP (and hence crash
> + *       the domain) when an RDMSR would require this, but no handler was
> + *       set yet. The precise conditions are implementation specific, and

You can drop the 'yet' here I think? As even if a handler has been set
and then removed we would still prevent injecting a #GP AFAICT. Not a
native speaker anyway, so I might be wrong on that one.

> + *       new code shouldn't rely on such behavior anyway.

I would use a stronger mustn't here instead of shouldn't.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.