[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pv: Rewrite segment context switching from scratch


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 4 Sep 2020 10:02:40 +0100
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andy Lutomirski <luto@xxxxxxxxxx>, Sarah Newman <srn@xxxxxxxxx>
  • Delivery-date: Fri, 04 Sep 2020 09:02:57 +0000
  • Ironport-sdr: xIFUAb5ifxSZ/QWfUC7NJ1vaXKok9/ztQdKlB72I0Ea4RjBKEjYj49B4qs55hUGp2nHbdhAwoe 7F/Mvk+h1OWnUPXJw0G30HmDbt1rjJQlkBwnu60cG5hiY6rRoiv4svWdZ/Kr9Pn350xySyhr0A sWxwYlLPNKfTbMxkjxIj/Qa+nRro9YQk+twljd6b+YhtiVQEjkVTCYVdnRNbo6R53I+hcj84VR nxvLSZsiH196qtRJ6Zzd0rZtlWsWhqwa5UNBlwRujKW7Gz5DR1VteIEDUJLY0WwTbDiokbLzPA K9g=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04/09/2020 07:55, Jan Beulich wrote:
> On 03.09.2020 23:36, Andrew Cooper wrote:
>> There are multiple bugs with the existing implementation, including incorrect
>> comments.
>>
>> On AMD CPUs prior to Zen2, loading a NUL segment selector doesn't clear the
>> segment base, which is a problem for 64bit code which typically expects to 
>> use
>> a NUL %fs/%gs selector.
>>
>> On a context switch from any PV vcpu, to a 64bit PV vcpu with an %fs/%gs
>> selector which faults, the fixup logic loads NUL, and the guest is entered at
>> the failsafe callback with the stale base.
>>
>> Alternatively, a PV context switch sequence of 64 (NUL, non-zero base) =>
>> 32 (NUL) => 64 (NUL, zero base) will similarly cause Xen to enter the guest
>> with a stale base.
>>
>> Both of these corner cases manifest as state corruption in the final vcpu.
>> However, damage is limited to to 64bit code expecting to use Thread Local
>> Storage with a base pointer of 0, which doesn't occur by default.
>>
>> The context switch logic is extremely complicated, and is attempting to
>> optimise away loading a NUL selector (which is fast), or writing a 64bit base
>> of 0 (which is rare).  Furthermore, it fails to respect Linux's ABI with
>> userspace, which manifests as userspace state corruption as far as Linux is
>> concerned.
>>
>> Always save and restore all selector and base state, in all cases.
>>
>> Leave a large comment explaining hardware behaviour, and the new ABI
>> expectations.  Update the comments in the public headers.
>>
>> Drop all "segment preloading" to handle the AMD corner case.  It was never
>> anything but a waste of time for %ds/%es, and isn't needed now that %fs/%gs
>> bases are unconditionally written for 64bit PV guests.  In load_segments(),
>> store the result of is_pv_32bit_vcpu() as it is an expensive predicate now,
>> and not used in a way which impacts speculative safety.
>>
>> Reported-by: Andy Lutomirski <luto@xxxxxxxxxx>
>> Reported-by: Sarah Newman <srn@xxxxxxxxx>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

I'm afraid I've found further bugs an ABI work to do.  v2 coming shortly.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.