[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regression with CET: [PATCH v1] x86/mm: avoid inadvertently degrading a TLB flush to local only


  • To: David Vrabel <dvrabel@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 28 Apr 2022 09:39:57 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=aooAZuzOsOb9Y9yf3pU6UWpZIsTsKdMGT5uy357NJyA=; b=l2k81vm7qnoKk7lp22bW0Q/BJIYwCOhRbffxivtsrvBGaE5qHQfRCodry/WBe7ptGzADPr/zU6V2wTdk1qIDHvFVSyJX7t2y6Ek9hCrnEtZ31MTJpBoyYYPo3BDP5XmIWCooyOmjRlsfT+zVKRGe0CM6pS+478jTdfDdxn2tjKOIEOLFzMWSVs69JEVjvHXu3u9zy8Kr6hwTwtYO5ichNf9TUqGN/4wA6D1F60jXpq72wHgERZ1z2GLGugg5JUdw6jSwIKrh+itb+oXXjVyT7KW8byST2MzsUpMYaSpvF9PgdxjLrDenLzYLHvwzUTA0uezP5YdnMElSaY5jOnJDsA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HCqOu/kEb/lhnHRNfDdicFYe/vlwC2dXuoiu8+NtkhOrzrgFsNOYdu+TyMpjR1RYPj1dAvbu9GRcd888VLHEg26v3m/DVXNv5nMbDAUp1CeiOfHu5qssWffyocg3pP0yiK+Cz5s2x4vt1Z+TMRlu/Mwh89MO3LBFTJe//C85PFa/yGuecXP96z5QzaptUDA2h1URGVJ9Ogn3rBF5yAoSSjxWcAtnLxBVBIKKqZ9DPjVbhfu5UlaUTYtGgCLjl5eNrEdPByD2Ras49o5TUixI5vA/PnyQpsZamfItqi5wYRK86+NwfptFnF4VOfDlBYR3CgUmzeZJ429OCkQwRk81JA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, David Vrabel <dvrabel@xxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Delivery-date: Thu, 28 Apr 2022 07:40:18 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.04.2022 09:37, Jan Beulich wrote:
> On 27.04.2022 20:44, David Vrabel wrote:
>>
>>
>> On 27/04/2022 19:03, Andrew Cooper wrote:
>>> On 19/04/2022 16:03, David Vrabel wrote:
>>>> From: David Vrabel <dvrabel@xxxxxxxxxxxx>
>>>>
>>>> If the direct map is incorrectly modified with interrupts disabled,
>>>> the required TLB flushes are degraded to flushing the local CPU only.
>>>>
>>>> This could lead to very hard to diagnose problems as different CPUs will
>>>> end up with different views of memory. Although, no such issues have yet
>>>> been identified.
>>>>
>>>> Change the check in the flush_area() macro to look at system_state
>>>> instead. This defers the switch from local to all later in the boot
>>>> (see xen/arch/x86/setup.c:__start_xen()). This is fine because
>>>> additional PCPUs are not brought up until after the system state is
>>>> SYS_STATE_smp_boot.
>>>>
>>>> Signed-off-by: David Vrabel <dvrabel@xxxxxxxxxxxx>
>>>
>>> This explodes on CET systems:
>>>
>>> (XEN) Assertion 'local_irq_is_enabled()' failed at arch/x86/smp.c:265
>>> (XEN) ----[ Xen-4.17.0-10.24-d  x86_64  debug=y  Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d040345300>] flush_area_mask+0x40/0x13e
>>> <snip>
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d040345300>] R flush_area_mask+0x40/0x13e
>>> (XEN)    [<ffff82d040338a40>] F modify_xen_mappings+0xc5/0x958
>>> (XEN)    [<ffff82d0404474f9>] F
>>> arch/x86/alternative.c#_alternative_instructions+0xb7/0xb9
>>> (XEN)    [<ffff82d0404476cc>] F alternative_branches+0xf/0x12
>>> (XEN)    [<ffff82d04044e37d>] F __start_xen+0x1ef4/0x2776
>>> (XEN)    [<ffff82d040203344>] F __high_start+0x94/0xa0
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion 'local_irq_is_enabled()' failed at arch/x86/smp.c:265
>>> (XEN) ****************************************
>>> (XEN)
>>>
>>> We really did want a local-only flush here, because we specifically
>>> intended to make self-modifying changes before bringing secondary CPUs up.
>>
>> I think the transition to SYS_STATE_smp_boot system state should be 
>> later. i.e., the last point were only 1 PCPU is guaranteed.
> 
> I'm not sure there isn't code which assumes pre-SMP initcalls to happen
> in this state already. So it may take addition of yet another state if
> no other solution can be found.

Or maybe this again shouldn't be using system_state but num_online_cpus()?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.