Xen project Mailing List

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 01/22/2018 06:39 PM, Andrew Cooper wrote: > On 22/01/18 16:51, Jan Beulich wrote: >>>>> On 22.01.18 at 16:00, <jgross@xxxxxxxx> wrote: >>> On 22/01/18 15:48, Jan Beulich wrote: >>>>>>> On 22.01.18 at 15:38, <jgross@xxxxxxxx> wrote: >>>>> On 22/01/18 15:22, Jan Beulich wrote: >>>>>>>>> On 22.01.18 at 15:18, <jgross@xxxxxxxx> wrote: >>>>>>> On 22/01/18 13:50, Jan Beulich wrote: >>>>>>>>>>> On 22.01.18 at 13:32, <jgross@xxxxxxxx> wrote: >>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor >>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for >>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area. >>>>>>>>> >>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After >>>>>>>>> saving the domain's registers stacks are switched back to the normal >>>>>>>>> per physical cpu ones in order to be able to address on-stack data >>>>>>>>> from other cpus e.g. while handling IPIs. >>>>>>>>> >>>>>>>>> Adding %cr3 switching between saving of the registers and switching >>>>>>>>> the stacks will enable the possibility to run guest code without any >>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being >>>>>>>>> able to access other domains data. >>>>>>>>> >>>>>>>>> Without any further measures it will still be possible for e.g. a >>>>>>>>> guest's user program to read stack data of another vcpu of the same >>>>>>>>> domain, but this can be easily avoided by a little PV-ABI modification >>>>>>>>> introducing per-cpu user address spaces. >>>>>>>>> >>>>>>>>> This series is meant as a replacement for Andrew's patch series: >>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution". >>>>>>>> Considering in particular the two reverts, what I'm missing here >>>>>>>> is a clear description of the meaningful additional protection this >>>>>>>> approach provides over the band-aid. For context see also >>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html >>>>>>>> >>>>>>> My approach supports mapping only the following data while the guest is >>>>>>> running (apart form the guest's own data, of course): >>>>>>> >>>>>>> - the per-vcpu entry stacks of the domain which will contain only the >>>>>>> guest's registers saved when an interrupt occurs >>>>>>> - the per-vcpu GDTs and TSSs of the domain >>>>>>> - the IDT >>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S >>>>>>> >>>>>>> All other hypervisor data and code can be completely hidden from the >>>>>>> guests. >>>>>> I understand that. What I'm not clear about is: Which parts of >>>>>> the additionally hidden data are actually necessary (or at least >>>>>> very desirable) to hide? >>>>> Necessary: >>>>> - other guests' memory (e.g. physical memory 1:1 mapping) >>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers, >>>>> code emulator buffers >>>>> - other guests' register values e.g. in vcpu structure >>>> All of this is already being made invisible by the band-aid (with the >>>> exception of leftovers on the hypervisor stacks across context >>>> switches, which we've already said could be taken care of by >>>> memset()ing that area). I'm asking about the _additional_ benefits >>>> of your approach. >>> I'm quite sure the performance will be much better as it doesn't require >>> per physical cpu L4 page tables, but just a shadow L4 table for each >>> guest L4 table, similar to the Linux kernel KPTI approach. >> But isn't that model having the same synchronization issues upon >> guest L4 updates which Andrew was fighting with? > > (Condensing a lot of threads down into one) > > All the methods have L4 synchronisation update issues, until we have a > PV ABI which guarantees that L4's don't get reused. Any improvements to > the shadowing/synchronisation algorithm will benefit all approaches. > > Juergen: you're now adding a LTR into the context switch path which > tends to be very slow. I.e. As currently presented, this series > necessarily has a higher runtime overhead than Jan's XPTI. So here are a repeat of the "hypervisor compile" tests I did, comparing the different XPTI-like series so far. # Experimental setup: Host: - Intel(R) Xeon(R) CPU E5630 @ 2.53GHz - 4 pcpus - Memory: 4GiB Guest: - 4vcpus, 512MiB, blkback to raw file - CentOS 6 userspace - Linux 4.14 kernel with PV / PVH / PVHVM / KVM guest support (along with expected drivers) built-in Test: - cd xen-4.10.0 - make -C xen clean - time make -j 4 xen # Results - In all cases, running a "default" build with CONFIG_DEBUG=n * Staging, xpti=off real 1m2.995s user 2m52.527s sys 0m40.276s Result: 63s * Staging [xpti default] real 1m27.190s user 3m3.900s sys 1m42.686s Result: 87s (38% overhead) Note also that the "system time" here is about 2.5x of "xpti=off"; so total wasted cpu time is significantly higher. * Staging + "x86: slightly reduce Meltdown band-aid overhead" real 1m21.661s user 3m3.809s sys 1m25.344s Result: 81s (28% overhead) NB that the "system time" here is significantly reduced from above, but still nearly double of the "system time" for plain PV * Above + "x86: reduce Meltdown band-aid overhead a little further" real 1m21.357s user 3m3.284s sys 1m25.379s Result: 81s (28% overhead) No real change * Staging + Juergen's v2 series real 1m3.018s user 2m52.217s sys 0m40.357s Result: 63s (0% overhead) Unfortunately, I can't really verify that Juergen's patches are having any effect; there's no printk indicating whether it's enabling the mitigation or not. I have verified that the changeset reported in `xl dmesg` corresponds to the branch I have with the patches applied. So it's *possible* something has gotten mixed up, and the mitigation isn't being applied; but if it *is* applied, the performance is significantly better than the "band-aid" XPTI. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.