Xen project Mailing List

Re: [Xen-devel] [PATCH FAIRLY-RFC 00/44] x86: Prerequisite work for a Xen KAISER solution

To: Juergen Gross <jgross@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 5 Jan 2018 09:26:55 +0000

Delivery-date: Fri, 05 Jan 2018 09:27:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05/01/2018 07:48, Juergen Gross wrote: > On 04/01/18 21:21, Andrew Cooper wrote: >> This work was developed as an SP3 mitigation, but shelved when it became >> clear >> that it wasn't viable to get done in the timeframe. >> >> To protect against SP3 attacks, most mappings needs to be flushed while in >> user context. However, to protect against all cross-VM attacks, it is >> necessary to ensure that the Xen stacks are not mapped in any other cpus >> address space, or an attacker can still recover at least the GPR state of >> separate VMs. > Above statement is too strict: it would be sufficient if no stacks of > other domains are mapped. Sadly not. Having stacks shared by domain means one vcpu can still steal at least GPR state from other vcpus belonging to the same domain. Whether or not a specific kernel cares, some definitely will. > I'm just working on a proof of concept using dedicated per-vcpu stacks > for 64 bit pv domains. Those stacks would be mapped in the per-domain > region of the address space. I hope to have a RFC version of the patches > ready next week. > > This would allow to remove the per physical cpu mappings in the guest > visible address space when doing page table isolation. > > In order to avoid SP3 attacks to other vcpu's stacks of the same guest > we could extend the pv ABI to mark a guest's user L4 page table as > "single use", i.e. not allowed to be active on multiple vcpus at the > same time (introducing that ABI modification in the Linux kernel would > be simple, as the Linux kernel currently lacks support for cross-cpu > stack exploits and when that support is being added by per-cpu L4 user > page tables we could just chime in). A L4 page table marked as "single > use" would map the local vcpu stacks only. For PV guests, it is the Xen stacks which matter, not the vcpu guest kernel's ones. 64bit PV guest kernels are already mitigated better than KPTI can ever manage, because there are no entry stacks or entry stubs required to be mapped into guest userspace at all. >> To have isolated stacks, Xen needs a per-pcpu isolated region, which requires >> that two pCPUs never share the same %cr3. This is trivial for 32bit PV >> guests >> and HVM guests due to the existing per-vcpu Monitor Tables, but is >> problematic >> for 64bit PV guests, which will run on the same %cr3 when scheduling >> different >> threads from the same process. >> >> To avoid breaking the PV ABI, Xen needs to shadow the guest L4 pagetables if >> it wants to maintain the unique %cr3 property it needs. >> >> tl;dr The shadowing algorithm in pt-shadow.c is too much of a performance >> overhead to be viable, and very high risk to productise in an embargo window. >> If we want to continue down this route, we either need someone to have a >> clever alternative to the shadowing algorithm I came up with, or change the >> PV >> ABI to require VMs not to share L4 pagetables. >> >> Either way, these patches are presented to start a discussion of the issues. >> The series as a whole is not in a suitable state for committing. > I think patch 1 should be excluded from that statement, as it is not > directly related to the series. There are bits of the series I do intend to take in, largely in this form. Another is "x86/pv: Drop support for paging out the LDT" because its long-since time for that to disappear. I should also say that the net changes to context switch and critical-structure handling across this series is a performance and security benefit, irrespective of the KAISER/KPTI side of things. They'd qualify for inclusion on their own merits alone (if it weren't for the dependent L4 shadowing issues). If you're interested, I stumbled onto patch one after introducing the per-pcpu stack mapping, as virt_to_maddr() came out spectacularly wrong. Very observant readers might also notice the bit of misc debugging which caused me to blindly stumble into XSA-243, which was an interesting diversion from Xen crashing because of my own pagetable mistakes. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.