[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/hvm: Fix boot on systems where HVM isn't available


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 7 Feb 2022 09:29:20 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zY8wF/cgDacGMiqAmzd3t1NrwEoSzJYZ79svQg7JFr0=; b=XeEEKEJtCpvmWuJiLunqWLlO4XKLlnARdCffEF6mDC5KkFc5+z3kDGZJBNdqVTQdyZJa+Ftrz7xi2h4e5MBsp5T+0bsEwKl1/+14rdOQRY2IhKMaHWmbgvPxRk6TI44K6MAEA1GA6HZSsXbzZNNOnr41GpnxcVMUwvxLqzpz7gnwFvLFt2M59UxFGlkfU7FSyiBzm2JNErLMlQ/u1pMsfKF0dlce4iyN1YrLKhN07GES8dJPKqzilxW8DzH6sFSmbVYZBMxTJKXrTAOsUVcrVF9aC4mhtWTswKyupWnKASPpmqRGQHaJ8Ezjudkfgx3+dsWfXOSgQyV8cZZCky5JhA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SFbS9wHbGfWOLM2t0DSvyPI7ZR36IiZXybARD+6P01aoHPRleym/yE5T0JQuRbL86QLXX0GkGKv7OFzNPsk6wHpm5Xr+JIolKC8rkL4q58puQ1v9dGm6MiUYKPvyGqXSkdUqulUBCmSTXxqRMWmghWwEXGhaRv7zsVNoA3N9wtbgDdyaHy/QGy+z/Y+pOcv0c5E0JAlKe89XUizZCWuth4WRQJfUT1LIHcimMRsb0Grnup/7UpEzvgRfoqJBSHlez//dRTFn9+XzS2zTEK1/tqRos5JDwEoH0fvHcfVfhPkBW5YAQd95w1xHabbv8Da1omA2BTyfbuifM5bLsuE2gg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 07 Feb 2022 08:29:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.02.2022 10:47, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote:
>> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
>> alt-call") went too far with dropping NULL function pointer checks.

Oh, I'm sorry, I should have noticed this.

>> smp_callin() calls hvm_cpu_up() unconditionally.  When the platform doesn't
>> support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
>> altcall pass nukes the (now unconditional) indirect call, causing:
>>
>>   (XEN) ----[ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]----
>>   (XEN) CPU:    1
>>   (XEN) RIP:    e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
>>   (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
>>   ...
>>   (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
>>   (XEN)  ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe 
>> ff ff
>>   ...
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
>>   (XEN)    [<ffff82d0402000e2>] F __high_start+0x42/0x60
>>
>> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
>> too, so what happen next is:
>>
>>   (XEN) ----[ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]----
>>   (XEN) CPU:    0
>>   (XEN) RIP:    e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
>>   (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>   ...
>>   (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
>>   (XEN)  48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 
>> 0d ff
>>   ...
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
>>   (XEN)    [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
>>   (XEN)    [<ffff82d04034a229>] F machine_restart+0xa2/0x298
>>   (XEN)    [<ffff82d04034a42a>] F 
>> arch/x86/shutdown.c#__machine_restart+0xb/0x11
>>   (XEN)    [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
>>   (XEN)    [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
>>   (XEN)    [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
>>   (XEN)    [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
>>   (XEN)    [<ffff82d04031f649>] F __udelay+0x3a/0x51
>>   (XEN)    [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
>>   (XEN)    [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
>>   (XEN)    [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
>>   (XEN)    [<ffff82d0402000ef>] F __high_start+0x4f/0x60
>>
>> which recurses until hitting a stack overflow.  The #DF handler, which resets
>> its stack on each invocation, loops indefinitely.
>>
>> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
>>
>> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations 
>> to alt-call")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

>> ---
>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> CC: Wei Liu <wl@xxxxxxx>
>>
>> RFC.  Not tested yet on the imacted hardware.  It's a Xeon PHI with another
>> werid thing in need of debugging.  First boot is fine, while second
>> boot (loading microcode this time) has a problem with vmx.

Sounds not unfamiliar: My meanwhile oldish Romley needs to be cold-
booted for VMX to actually be usable (not locked) on APs.

>> I wonder if we want to modify the callers to check for HVM being enabled,
>> rather than leaving the NULL pointer checks in a position where they're 
>> liable
>> to be reaped again.
> 
> What about adding a couple of comments to hvm_cpu_{up,down} to note
> they are called unconditionally regardless of whether HVM is present
> or not?

I second this as the perhaps better alternative: The S3 path is
similarly affected (and you may want to mention this in the
description), so this would mean up to 5 conditionals (at the
source level) instead of the just two you get away with here.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.