[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 22/01/18 16:51, Jan Beulich wrote:
>>>> On 22.01.18 at 16:00, <jgross@xxxxxxxx> wrote:
>> On 22/01/18 15:48, Jan Beulich wrote:
>>>>>> On 22.01.18 at 15:38, <jgross@xxxxxxxx> wrote:
>>>> On 22/01/18 15:22, Jan Beulich wrote:
>>>>>>>> On 22.01.18 at 15:18, <jgross@xxxxxxxx> wrote:
>>>>>> On 22/01/18 13:50, Jan Beulich wrote:
>>>>>>>>>> On 22.01.18 at 13:32, <jgross@xxxxxxxx> wrote:
>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor
>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area.
>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After
>>>>>>>> saving the domain's registers stacks are switched back to the normal
>>>>>>>> per physical cpu ones in order to be able to address on-stack data
>>>>>>>> from other cpus e.g. while handling IPIs.
>>>>>>>> Adding %cr3 switching between saving of the registers and switching
>>>>>>>> the stacks will enable the possibility to run guest code without any
>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>>>>>>>> able to access other domains data.
>>>>>>>> Without any further measures it will still be possible for e.g. a
>>>>>>>> guest's user program to read stack data of another vcpu of the same
>>>>>>>> domain, but this can be easily avoided by a little PV-ABI modification
>>>>>>>> introducing per-cpu user address spaces.
>>>>>>>> This series is meant as a replacement for Andrew's patch series:
>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution".
>>>>>>> Considering in particular the two reverts, what I'm missing here
>>>>>>> is a clear description of the meaningful additional protection this
>>>>>>> approach provides over the band-aid. For context see also
>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>>>>>> My approach supports mapping only the following data while the guest is
>>>>>> running (apart form the guest's own data, of course):
>>>>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>>>>   guest's registers saved when an interrupt occurs
>>>>>> - the per-vcpu GDTs and TSSs of the domain
>>>>>> - the IDT
>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>>>> All other hypervisor data and code can be completely hidden from the
>>>>>> guests.
>>>>> I understand that. What I'm not clear about is: Which parts of
>>>>> the additionally hidden data are actually necessary (or at least
>>>>> very desirable) to hide?
>>>> Necessary:
>>>> - other guests' memory (e.g. physical memory 1:1 mapping)
>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>>>   code emulator buffers
>>>> - other guests' register values e.g. in vcpu structure
>>> All of this is already being made invisible by the band-aid (with the
>>> exception of leftovers on the hypervisor stacks across context
>>> switches, which we've already said could be taken care of by
>>> memset()ing that area). I'm asking about the _additional_ benefits
>>> of your approach.
>> I'm quite sure the performance will be much better as it doesn't require
>> per physical cpu L4 page tables, but just a shadow L4 table for each
>> guest L4 table, similar to the Linux kernel KPTI approach.
> But isn't that model having the same synchronization issues upon
> guest L4 updates which Andrew was fighting with?

(Condensing a lot of threads down into one)

All the methods have L4 synchronisation update issues, until we have a
PV ABI which guarantees that L4's don't get reused.  Any improvements to
the shadowing/synchronisation algorithm will benefit all approaches.

Juergen: you're now adding a LTR into the context switch path which
tends to be very slow.  I.e. As currently presented, this series
necessarily has a higher runtime overhead than Jan's XPTI.

One of my concerns is that this patch series moves further away from the
secondary goal of my KAISER series, which was to have the IDT and GDT
mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
leak which CPU you're currently scheduled on into PV guests and b) the
context switch code can drop a load of its slow instructions like LGDT
and the VMWRITEs to update the VMCS.

Jan: As to the things not covered by the current XPTI, hiding most of
the .text section is important to prevent fingerprinting or ROP
scanning.  This is a defence-in-depth argument, but a guest being easily
able to identify whether certain XSAs are fixed or not is quite bad. 
Also, a load of CPU 0's data data-structures, including the stack is
visible in .data.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.