[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 22/01/18 18:48, George Dunlap wrote:
> On 01/22/2018 06:39 PM, Andrew Cooper wrote:
>> On 22/01/18 16:51, Jan Beulich wrote:
>>>>>> On 22.01.18 at 16:00, <jgross@xxxxxxxx> wrote:
>>>> On 22/01/18 15:48, Jan Beulich wrote:
>>>>>>>> On 22.01.18 at 15:38, <jgross@xxxxxxxx> wrote:
>>>>>> On 22/01/18 15:22, Jan Beulich wrote:
>>>>>>>>>> On 22.01.18 at 15:18, <jgross@xxxxxxxx> wrote:
>>>>>>>> On 22/01/18 13:50, Jan Beulich wrote:
>>>>>>>>>>>> On 22.01.18 at 13:32, <jgross@xxxxxxxx> wrote:
>>>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor
>>>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>>>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area.
>>>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After
>>>>>>>>>> saving the domain's registers stacks are switched back to the normal
>>>>>>>>>> per physical cpu ones in order to be able to address on-stack data
>>>>>>>>>> from other cpus e.g. while handling IPIs.
>>>>>>>>>> Adding %cr3 switching between saving of the registers and switching
>>>>>>>>>> the stacks will enable the possibility to run guest code without any
>>>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>>>>>>>>>> able to access other domains data.
>>>>>>>>>> Without any further measures it will still be possible for e.g. a
>>>>>>>>>> guest's user program to read stack data of another vcpu of the same
>>>>>>>>>> domain, but this can be easily avoided by a little PV-ABI 
>>>>>>>>>> modification
>>>>>>>>>> introducing per-cpu user address spaces.
>>>>>>>>>> This series is meant as a replacement for Andrew's patch series:
>>>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution".
>>>>>>>>> Considering in particular the two reverts, what I'm missing here
>>>>>>>>> is a clear description of the meaningful additional protection this
>>>>>>>>> approach provides over the band-aid. For context see also
>>>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>>>>>>>> My approach supports mapping only the following data while the guest is
>>>>>>>> running (apart form the guest's own data, of course):
>>>>>>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>>>>>>   guest's registers saved when an interrupt occurs
>>>>>>>> - the per-vcpu GDTs and TSSs of the domain
>>>>>>>> - the IDT
>>>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>>>>>> All other hypervisor data and code can be completely hidden from the
>>>>>>>> guests.
>>>>>>> I understand that. What I'm not clear about is: Which parts of
>>>>>>> the additionally hidden data are actually necessary (or at least
>>>>>>> very desirable) to hide?
>>>>>> Necessary:
>>>>>> - other guests' memory (e.g. physical memory 1:1 mapping)
>>>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>>>>>   code emulator buffers
>>>>>> - other guests' register values e.g. in vcpu structure
>>>>> All of this is already being made invisible by the band-aid (with the
>>>>> exception of leftovers on the hypervisor stacks across context
>>>>> switches, which we've already said could be taken care of by
>>>>> memset()ing that area). I'm asking about the _additional_ benefits
>>>>> of your approach.
>>>> I'm quite sure the performance will be much better as it doesn't require
>>>> per physical cpu L4 page tables, but just a shadow L4 table for each
>>>> guest L4 table, similar to the Linux kernel KPTI approach.
>>> But isn't that model having the same synchronization issues upon
>>> guest L4 updates which Andrew was fighting with?
>> (Condensing a lot of threads down into one)
>> All the methods have L4 synchronisation update issues, until we have a
>> PV ABI which guarantees that L4's don't get reused.  Any improvements to
>> the shadowing/synchronisation algorithm will benefit all approaches.
>> Juergen: you're now adding a LTR into the context switch path which
>> tends to be very slow.  I.e. As currently presented, this series
>> necessarily has a higher runtime overhead than Jan's XPTI.
>> One of my concerns is that this patch series moves further away from the
>> secondary goal of my KAISER series, which was to have the IDT and GDT
>> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
>> leak which CPU you're currently scheduled on into PV guests and b) the
>> context switch code can drop a load of its slow instructions like LGDT
>> and the VMWRITEs to update the VMCS.
>> Jan: As to the things not covered by the current XPTI, hiding most of
>> the .text section is important to prevent fingerprinting or ROP
>> scanning.  This is a defence-in-depth argument, but a guest being easily
>> able to identify whether certain XSAs are fixed or not is quite bad. 
> I'm afraid we have a fairly different opinion of what is "quite bad".

I suggest you try talking to some real users then.

> Suppose we handed users a knob and said, "If you flip this switch,
> attackers won't be able to tell if you've fixed XSAs or not without
> trying them; but it will slow down your guests 20%."  How many do you
> think would flip it, and how many would reckon that an attacker could
> probably find out that information anyway?

Nonsense.  The performance hit is already taken.  The argument is "do
you want an attacker able to trivially evaluate security weaknesses in
your hypervisor", a process which usually has to be done by guesswork
and knowing the exact binary under attack.  Having .text fully readable
lowers the barrier to entry substantially.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.