[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: x86/CET: Fix S3 resume with shadow stacks active


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 25 Feb 2022 09:38:04 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pdk83KMn+1VFA3mDD5VKw0kBIVuWYUxHw0+Umf7uJoE=; b=cUhiKb3SmlXqeFggxUmzjkZQHAb8g6eMC8wbWH6gOziZUlOevccrCnfySAmIbM0WzuX9+N/ow2Iu/8cBEbAx8MGDl/ukzvCkFWy1aL9UT6VCnYHushEx+AjottebOmoKM3jRPc/aY1er1ytTfgWjhpsKnNK6Y85HM7zePN+Sz4gvqQNNYp5m3mLxvRYo1BDwspCimMQhX9LSgB13EAIrlFHoUfT/vvvI1MSO+tRgcOQ8G2o1AX8uKaYdFxNfhPful2AtbJjwrBE0siuFV0scd0DDoZKNVDrhD5jtGIEPNYmwx2LvccZunMrIDUfQvkhaVHbjfVCBMEk+8z+gLFBbEA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IDbI2rTXuwk+/k2kMnqek0+rF3HBz7z+W+V8C++sZlUasoyR6FadAnRBZScC78pj5mPwTzgqYukeLldkLZtHh49t7uupFCnQM+pMVFSuBhe1s6bKsvfHLPa5yDMqSv/Z5xv6++18vRB1JOP+sORwpn3sNbYpDpV+R3n8CuRz15DgJPqvUwU9w9QVJb0hcES0HpIIEA+AF0dXcBsISrw1hbvoyi7+8cv9lYyEr2XVCbWiTyi5Icj96IQtJvmOpaTX3yXWRf0S5G2gDCQQS8mvcz2hit2haDFX6B9xvQmOSHu3kERYGnShj6qb25k+msHY5fmCRiyxsJ2RPZ6r3R8DvA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Thiner Logoer <logoerthiner1@xxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 25 Feb 2022 08:38:13 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.02.2022 20:48, Andrew Cooper wrote:
> The original shadow stack support has an error on S3 resume with very bizzare
> fallout.  The BSP comes back up, but APs fail with:
> 
>   (XEN) Enabling non-boot CPUs ...
>   (XEN) Stuck ??
>   (XEN) Error bringing CPU1 up: -5
> 
> and then later (on at least two Intel TigerLake platforms), the next HVM vCPU
> to be scheduled on the BSP dies with:
> 
>   (XEN) d1v0 Unexpected vmexit: reason 3
>   (XEN) domain_crash called from vmx.c:4304
>   (XEN) Domain 1 (vcpu#0) crashed on cpu#0:
> 
> The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the
> scheduled vCPU, and will be addressed in a subsequent patch.  It is a
> consequence of the APs triple faulting.
> 
> The reason the APs triple fault is because we don't tear down the stacks on
> suspend.  The idle/play_dead loop is killed in the middle of running, meaning
> that the supervisor token is left busy.
> 
> On resume, SETSSBSY finds the token already busy, suffers #CP and triple
> faults because the IDT isn't configured this early.
> 
> Rework the AP bringup path to (re)create the supervisor token.  This ensures
> the primary stack is non-busy before use.
> 
> Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks")
> Link: https://github.com/QubesOS/qubes-issues/issues/7283
> Reported-by: Thiner Logoer <logoerthiner1@xxxxxxx>
> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Tested-by: Thiner Logoer <logoerthiner1@xxxxxxx>
> Tested-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

> Slightly RFC.  This does fix the crash encountered, but it occurs to me that
> there's a race condition when S3 platform powerdown is incident with an
> NMI/#MC, where more than just the primary shadow stack can end up busy on
> resume.
> 
> A larger fix would be to change how we allocate tokens, and always have each
> CPU set up its own tokens.  I didn't do this originally in the hopes of having
> WRSSQ generally disabled, but that plan failed when encountering reality...

While I think this wants fixing one way or another, I also think this
shouldn't block the immediate fix here (which addresses an unconditional
crash rather than a pretty unlikely one).

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.