Xen project Mailing List

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 22/01/18 18:48, George Dunlap wrote: > On 01/22/2018 06:39 PM, Andrew Cooper wrote: >> On 22/01/18 16:51, Jan Beulich wrote: >>>>>> On 22.01.18 at 16:00, <jgross@xxxxxxxx> wrote: >>>> On 22/01/18 15:48, Jan Beulich wrote: >>>>>>>> On 22.01.18 at 15:38, <jgross@xxxxxxxx> wrote: >>>>>> On 22/01/18 15:22, Jan Beulich wrote: >>>>>>>>>> On 22.01.18 at 15:18, <jgross@xxxxxxxx> wrote: >>>>>>>> On 22/01/18 13:50, Jan Beulich wrote: >>>>>>>>>>>> On 22.01.18 at 13:32, <jgross@xxxxxxxx> wrote: >>>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor >>>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for >>>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area. >>>>>>>>>> >>>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After >>>>>>>>>> saving the domain's registers stacks are switched back to the normal >>>>>>>>>> per physical cpu ones in order to be able to address on-stack data >>>>>>>>>> from other cpus e.g. while handling IPIs. >>>>>>>>>> >>>>>>>>>> Adding %cr3 switching between saving of the registers and switching >>>>>>>>>> the stacks will enable the possibility to run guest code without any >>>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being >>>>>>>>>> able to access other domains data. >>>>>>>>>> >>>>>>>>>> Without any further measures it will still be possible for e.g. a >>>>>>>>>> guest's user program to read stack data of another vcpu of the same >>>>>>>>>> domain, but this can be easily avoided by a little PV-ABI >>>>>>>>>> modification >>>>>>>>>> introducing per-cpu user address spaces. >>>>>>>>>> >>>>>>>>>> This series is meant as a replacement for Andrew's patch series: >>>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution". >>>>>>>>> Considering in particular the two reverts, what I'm missing here >>>>>>>>> is a clear description of the meaningful additional protection this >>>>>>>>> approach provides over the band-aid. For context see also >>>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html >>>>>>>>> >>>>>>>> My approach supports mapping only the following data while the guest is >>>>>>>> running (apart form the guest's own data, of course): >>>>>>>> >>>>>>>> - the per-vcpu entry stacks of the domain which will contain only the >>>>>>>> guest's registers saved when an interrupt occurs >>>>>>>> - the per-vcpu GDTs and TSSs of the domain >>>>>>>> - the IDT >>>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S >>>>>>>> >>>>>>>> All other hypervisor data and code can be completely hidden from the >>>>>>>> guests. >>>>>>> I understand that. What I'm not clear about is: Which parts of >>>>>>> the additionally hidden data are actually necessary (or at least >>>>>>> very desirable) to hide? >>>>>> Necessary: >>>>>> - other guests' memory (e.g. physical memory 1:1 mapping) >>>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers, >>>>>> code emulator buffers >>>>>> - other guests' register values e.g. in vcpu structure >>>>> All of this is already being made invisible by the band-aid (with the >>>>> exception of leftovers on the hypervisor stacks across context >>>>> switches, which we've already said could be taken care of by >>>>> memset()ing that area). I'm asking about the _additional_ benefits >>>>> of your approach. >>>> I'm quite sure the performance will be much better as it doesn't require >>>> per physical cpu L4 page tables, but just a shadow L4 table for each >>>> guest L4 table, similar to the Linux kernel KPTI approach. >>> But isn't that model having the same synchronization issues upon >>> guest L4 updates which Andrew was fighting with? >> (Condensing a lot of threads down into one) >> >> All the methods have L4 synchronisation update issues, until we have a >> PV ABI which guarantees that L4's don't get reused. Any improvements to >> the shadowing/synchronisation algorithm will benefit all approaches. >> >> Juergen: you're now adding a LTR into the context switch path which >> tends to be very slow. I.e. As currently presented, this series >> necessarily has a higher runtime overhead than Jan's XPTI. >> >> One of my concerns is that this patch series moves further away from the >> secondary goal of my KAISER series, which was to have the IDT and GDT >> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't >> leak which CPU you're currently scheduled on into PV guests and b) the >> context switch code can drop a load of its slow instructions like LGDT >> and the VMWRITEs to update the VMCS. >> >> Jan: As to the things not covered by the current XPTI, hiding most of >> the .text section is important to prevent fingerprinting or ROP >> scanning. This is a defence-in-depth argument, but a guest being easily >> able to identify whether certain XSAs are fixed or not is quite bad. > I'm afraid we have a fairly different opinion of what is "quite bad". I suggest you try talking to some real users then. > Suppose we handed users a knob and said, "If you flip this switch, > attackers won't be able to tell if you've fixed XSAs or not without > trying them; but it will slow down your guests 20%." How many do you > think would flip it, and how many would reckon that an attacker could > probably find out that information anyway? Nonsense. The performance hit is already taken. The argument is "do you want an attacker able to trivially evaluate security weaknesses in your hypervisor", a process which usually has to be done by guesswork and knowing the exact binary under attack. Having .text fully readable lowers the barrier to entry substantially. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.