[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] x86/amd: Fix race editing DE_CFG


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 27 Nov 2025 17:42:41 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=b5Y/rA95DUg0edVrVnp8ZVSBrDPip/Y0RXghmtJA4Ig=; b=gT8u6EE0tz7W/JEizcRPO3CP4fYLXu/exxh6fNJDjYLURsiwwVTiHhUCRluuVUTK7XgCPMm+iG25PEBRSRiXY9opwvozhC2EEoqpXVtAxUc8sZbZtfGY09GvXd//xuAKiGHZHeYD/rsdL+tAxZChR8s9vpCrGqMBYbJ3HyZJqJsFjclKfHkTM9ZDYGpTHdSzLhTBO0aRpyrb4UtgVXY2ITq5u95p6eaLUzqFMcPZKxAnaakmTAq8mGW/nN4Q6j8mg60Qk4skUmBceGmVB0oLUIvxfRDydu9eH+XYmu/e5IsNdBVnU+YCxo27ppM1ssA4xW3pEGGgBhrIxw7GrXKDnw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=aEzpBak2o87o3DtmX2fQ7+mS3h0qdrro6OU6ky5i+KQDAHbPs8UJgICn7ooLuyb3RXOxi77y4mI8EivXOrA1PgKwm2Y5WuVHP0g/7lLl1xLKXpKS6jOTnpClbZC4U1pBhP2TbX6nigcojc9dKl2IYXf0wBaNATSH4kFhTeSH2Kt1xpo4SR+WpgcLQGFL9wak6GE+IqyYsoHDImz+s4+u2/L8xPeZBZxjgHvRgyvPneWdkm28Pg580RDyRr2opxpln4po9rwkBrvspvyQinabLmQTLfiNGwkCPWiL/67Mp4ALU9skIEqhNqzL+Kw+KC+SI9vf+X8YqwyrGE5gON50Rw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: andrew.cooper3@xxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 27 Nov 2025 17:42:58 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 27/11/2025 7:58 am, Jan Beulich wrote:
> On 26.11.2025 18:56, Andrew Cooper wrote:
>> On 26/11/2025 4:55 pm, Andrew Cooper wrote:
>>> On 26/11/2025 3:07 pm, Jan Beulich wrote:
>>>> On 26.11.2025 14:22, Andrew Cooper wrote:
>>>>> @@ -1075,6 +966,112 @@ static void cf_check fam17_disable_c6(void *arg)
>>>>>   wrmsrl(MSR_AMD_CSTATE_CFG, val & mask);
>>>>>  }
>>>>>  
>>>>> +static bool zenbleed_use_chickenbit(void)
>>>>> +{
>>>>> +    unsigned int curr_rev;
>>>>> +    uint8_t fixed_rev;
>>>>> +
>>>>> +    /*
>>>>> +     * If we're virtualised, we can't do family/model checks safely, and
>>>>> +     * we likely wouldn't have access to DE_CFG even if we could see a
>>>>> +     * microcode revision.
>>>>> +     *
>>>>> +     * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
>>>>> +     * position to care either way.  An admin doesn't want to be 
>>>>> disabling
>>>>> +     * AVX as a mitigation on any build of Xen with this logic present.
>>>>> +     */
>>>>> +    if ( cpu_has_hypervisor || boot_cpu_data.family != 0x17 )
>>>>> +        return false;
>>>>> +
>>>>> +    curr_rev = this_cpu(cpu_sig).rev;
>>>>> +    switch ( curr_rev >> 8 )
>>>>> +    {
>>>>> +    case 0x083010: fixed_rev = 0x7a; break;
>>>>> +    case 0x086001: fixed_rev = 0x0b; break;
>>>>> +    case 0x086081: fixed_rev = 0x05; break;
>>>>> +    case 0x087010: fixed_rev = 0x32; break;
>>>>> +    case 0x08a000: fixed_rev = 0x08; break;
>>>>> +    default:
>>>>> +        /*
>>>>> +         * With the Fam17h check above, most parts getting here are Zen1.
>>>>> +         * They're not affected.  Assume Zen2 ones making it here are 
>>>>> affected
>>>>> +         * regardless of microcode version.
>>>>> +         */
>>>>> +        return is_zen2_uarch();
>>>>> +    }
>>>>> +
>>>>> +    return (uint8_t)curr_rev >= fixed_rev;
>>>>> +}
>>>>> +
>>>>> +void amd_init_de_cfg(const struct cpuinfo_x86 *c)
>>>>> +{
>>>>> +    uint64_t val, new = 0;
>>>>> +
>>>>> +    /* The MSR doesn't exist on Fam 0xf/0x11. */
>>>>> +    if ( c->family != 0xf && c->family != 0x11 )
>>>>> +        return;
>>>> Comment and code don't match. Did you mean
>>>>
>>>>     if ( c->family == 0xf || c->family == 0x11 )
>>>>         return;
>>>>
>>>> (along the lines of what you have in amd_init_lfence_dispatch())?
>>> Oh - that was a last minute refactor which I didn't do quite correctly. 
>>> Yes, it should match amd_init_lfence_dispatch().
>>>
>>>>> +    /*
>>>>> +     * On Zen3 (Fam 0x19) and later CPUs, LFENCE is unconditionally 
>>>>> dispatch
>>>>> +     * serialising, and is enumerated in CPUID.  Hypervisors may also
>>>>> +     * enumerate it when the setting is in place and MSR_AMD64_DE_CFG 
>>>>> isn't
>>>>> +     * available.
>>>>> +     */
>>>>> +    if ( !test_bit(X86_FEATURE_LFENCE_DISPATCH, c->x86_capability) )
>>>>> +        new |= AMD64_DE_CFG_LFENCE_SERIALISE;
>>>>> +
>>>>> +    /*
>>>>> +     * If vulnerable to Zenbleed and not mitigated in microcode, use the
>>>>> +     * bigger hammer.
>>>>> +     */
>>>>> +    if ( zenbleed_use_chickenbit() )
>>>>> +        new |= (1 << 9);
>>>>> +
>>>>> +    if ( !new )
>>>>> +        return;
>>>>> +
>>>>> +    if ( rdmsr_safe(MSR_AMD64_DE_CFG, &val) ||
>>>>> +         (val & new) == new )
>>>>> +        return;
>>>>> +
>>>>> +    /*
>>>>> +     * DE_CFG is a Core-scoped MSR, and this write is racy.  However, 
>>>>> both
>>>>> +     * threads calculate the new value from state which expected to be
>>>>> +     * consistent across CPUs and unrelated to the old value, so the 
>>>>> result
>>>>> +     * should be consistent.
>>>>> +     */
>>>>> +    wrmsr_safe(MSR_AMD64_DE_CFG, val | new);
>>>> Either of the bits may be the cause of #GP. In that case we wouldn't set 
>>>> the
>>>> other bit, even if it may be possible to set it.
>>> This MSR does not #GP on real hardware.
> I consider this unexpected / inconsistent, at least as long as some of the
> bits would be documented as reserved. "Would be" because the particular
> Fam17 and Fam19 PPRs I'm looking at don't even mention DE_CFG (or BP_CFG,
> for that matter).

You need the even-more-NDA manual to find those details.

Reserved doesn't mean #GP. It means "don't rely on the behaviour".

>>> Also, both of these bits come from instructions AMD have provided,
>>> saying "set $X in case $Y", which we have honoured as part of the
>>> conditions for setting up new, which I consider to be a reasonable
>>> guarantee that no #GP will ensue.
> The AMD instructions are for particular models, aren't they? While that
> may mean the bits are fine to blindly (try to) set on other models, pretty
> likely this can't be extended to other families. (While
> zenbleed_use_chickenbit() is family-specific, the LFENCE bit is tried
> without regard to family.)

The Managing Speciation whitepaper says "set bit 1 on Fam 10, 12, 14,
15-17".

It also says that AMD will treat the MSR and bit 1 as architectural
moving forwards.  In reality, on Zen3 (post-dating the whitepaper) and
later, it's write-discard, read-as-1, and this is the behaviour we
provide to all VMs.

The Zenbleed instruction say "set bit 9 on Zen2".

So, the logic in this patch following AMD's written instructions.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.