Xen project Mailing List

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 01/22/2018 06:39 PM, Andrew Cooper wrote: > On 22/01/18 16:51, Jan Beulich wrote: >>>>> On 22.01.18 at 16:00, <jgross@xxxxxxxx> wrote: >>> On 22/01/18 15:48, Jan Beulich wrote: >>>>>>> On 22.01.18 at 15:38, <jgross@xxxxxxxx> wrote: >>>>> On 22/01/18 15:22, Jan Beulich wrote: >>>>>>>>> On 22.01.18 at 15:18, <jgross@xxxxxxxx> wrote: >>>>>>> On 22/01/18 13:50, Jan Beulich wrote: >>>>>>>>>>> On 22.01.18 at 13:32, <jgross@xxxxxxxx> wrote: >>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor >>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for >>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area. >>>>>>>>> >>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After >>>>>>>>> saving the domain's registers stacks are switched back to the normal >>>>>>>>> per physical cpu ones in order to be able to address on-stack data >>>>>>>>> from other cpus e.g. while handling IPIs. >>>>>>>>> >>>>>>>>> Adding %cr3 switching between saving of the registers and switching >>>>>>>>> the stacks will enable the possibility to run guest code without any >>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being >>>>>>>>> able to access other domains data. >>>>>>>>> >>>>>>>>> Without any further measures it will still be possible for e.g. a >>>>>>>>> guest's user program to read stack data of another vcpu of the same >>>>>>>>> domain, but this can be easily avoided by a little PV-ABI modification >>>>>>>>> introducing per-cpu user address spaces. >>>>>>>>> >>>>>>>>> This series is meant as a replacement for Andrew's patch series: >>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution". >>>>>>>> Considering in particular the two reverts, what I'm missing here >>>>>>>> is a clear description of the meaningful additional protection this >>>>>>>> approach provides over the band-aid. For context see also >>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html >>>>>>>> >>>>>>> My approach supports mapping only the following data while the guest is >>>>>>> running (apart form the guest's own data, of course): >>>>>>> >>>>>>> - the per-vcpu entry stacks of the domain which will contain only the >>>>>>> guest's registers saved when an interrupt occurs >>>>>>> - the per-vcpu GDTs and TSSs of the domain >>>>>>> - the IDT >>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S >>>>>>> >>>>>>> All other hypervisor data and code can be completely hidden from the >>>>>>> guests. >>>>>> I understand that. What I'm not clear about is: Which parts of >>>>>> the additionally hidden data are actually necessary (or at least >>>>>> very desirable) to hide? >>>>> Necessary: >>>>> - other guests' memory (e.g. physical memory 1:1 mapping) >>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers, >>>>> code emulator buffers >>>>> - other guests' register values e.g. in vcpu structure >>>> All of this is already being made invisible by the band-aid (with the >>>> exception of leftovers on the hypervisor stacks across context >>>> switches, which we've already said could be taken care of by >>>> memset()ing that area). I'm asking about the _additional_ benefits >>>> of your approach. >>> I'm quite sure the performance will be much better as it doesn't require >>> per physical cpu L4 page tables, but just a shadow L4 table for each >>> guest L4 table, similar to the Linux kernel KPTI approach. >> But isn't that model having the same synchronization issues upon >> guest L4 updates which Andrew was fighting with? > > (Condensing a lot of threads down into one) > > All the methods have L4 synchronisation update issues, until we have a > PV ABI which guarantees that L4's don't get reused. Any improvements to > the shadowing/synchronisation algorithm will benefit all approaches. > > Juergen: you're now adding a LTR into the context switch path which > tends to be very slow. I.e. As currently presented, this series > necessarily has a higher runtime overhead than Jan's XPTI. > > One of my concerns is that this patch series moves further away from the > secondary goal of my KAISER series, which was to have the IDT and GDT > mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't > leak which CPU you're currently scheduled on into PV guests and b) the > context switch code can drop a load of its slow instructions like LGDT > and the VMWRITEs to update the VMCS. > > Jan: As to the things not covered by the current XPTI, hiding most of > the .text section is important to prevent fingerprinting or ROP > scanning. This is a defence-in-depth argument, but a guest being easily > able to identify whether certain XSAs are fixed or not is quite bad. I'm afraid we have a fairly different opinion of what is "quite bad". Suppose we handed users a knob and said, "If you flip this switch, attackers won't be able to tell if you've fixed XSAs or not without trying them; but it will slow down your guests 20%." How many do you think would flip it, and how many would reckon that an attacker could probably find out that information anyway? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.