Xen project Mailing List

[Xen-devel] Ongoing/future speculative mitigation work

To: Xen-devel List <xen-devel@xxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 18 Oct 2018 18:46:22 +0100

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Cc: Martin Pohlack <mpohlack@xxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Joao Martins <joao.m.martins@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Daniel Kiper <daniel.kiper@xxxxxxxxxx>, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Anthony Liguori <aliguori@xxxxxxxxxx>, "Dannowski, Uwe" <uwed@xxxxxxxxx>, Lars Kurth <lars.kurth@xxxxxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, Ross Philipson <ross.philipson@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Juergen Gross <JGross@xxxxxxxx>, Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>, "Woodhouse, David" <dwmw@xxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 18 Oct 2018 17:46:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

Hello, This is an accumulation and summary of various tasks which have been discussed since the revelation of the speculative security issues in January, and also an invitation to discuss alternative ideas. They are x86 specific, but a lot of the principles are architecture-agnostic. 1) A secrets-free hypervisor. Basically every hypercall can be (ab)used by a guest, and used as an arbitrary cache-load gadget. Logically, this is the first half of a Spectre SP1 gadget, and is usually the first stepping stone to exploiting one of the speculative sidechannels. Short of compiling Xen with LLVM's Speculative Load Hardening (which is still experimental, and comes with a ~30% perf hit in the common case), this is unavoidable. Furthermore, throwing a few array_index_nospec() into the code isn't a viable solution to the problem. An alternative option is to have less data mapped into Xen's virtual address space - if a piece of memory isn't mapped, it can't be loaded into the cache. An easy first step here is to remove Xen's directmap, which will mean that guests general RAM isn't mapped by default into Xen's address space. This will come with some performance hit, as the map_domain_page() infrastructure will now have to actually create/destroy mappings, but removing the directmap will cause an improvement for non-speculative security as well (No possibility of ret2dir as an exploit technique). Beyond the directmap, there are plenty of other interesting secrets in the Xen heap and other mappings, such as the stacks of the other pcpus. Fixing this requires moving Xen to having a non-uniform memory layout, and this is much harder to change. I already experimented with this as a meltdown mitigation around about a year ago, and posted the resulting series on Jan 4th, https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html, some trivial bits of which have already found their way upstream. To have a non-uniform memory layout, Xen may not share L4 pagetables. i.e. Xen must never have two pcpus which reference the same pagetable in %cr3. This property already holds for 32bit PV guests, and all HVM guests, but 64bit PV guests are the sticking point. Because Linux has a flat memory layout, when a 64bit PV guest schedules two threads from the same process on separate vcpus, those two vcpus have the same virtual %cr3, and currently, Xen programs the same real %cr3 into hardware. If we want Xen to have a non-uniform layout, are two options are: * Fix Linux to have the same non-uniform layout that Xen wants (Backwards compatibility for older 64bit PV guests can be achieved with xen-shim). * Make use XPTI algorithm (specifically, the pagetable sync/copy part) forever more in the future. Option 2 isn't great (especially for perf on fixed hardware), but does keep all the necessary changes in Xen. Option 1 looks to be the better option longterm. As an interesting point to note. The 32bit PV ABI prohibits sharing of L3 pagetables, because back in the 32bit hypervisor days, we used to have linear mappings in the Xen virtual range. This check is stale (from a functionality point of view), but still present in Xen. A consequence of this is that 32bit PV guests definitely don't share top-level pagetables across vcpus. Juergen/Boris: Do you have any idea if/how easy this infrastructure would be to implement for 64bit PV guests as well? If a PV guest can advertise via Elfnote that it won't share top-level pagetables, then we can audit this trivially in Xen. 2) Scheduler improvements. (I'm afraid this is rather more sparse because I'm less familiar with the scheduler details.) At the moment, all of Xen's schedulers will happily put two vcpus from different domains on sibling hyperthreads. There has been a lot of sidechannel research over the past decade demonstrating ways for one thread to infer what is going on the other, but L1TF is the first vulnerability I'm aware of which allows one thread to directly read data out of the other. Either way, it is now definitely a bad thing to run different guests concurrently on siblings. Fixing this by simply not scheduling vcpus from a different guest on siblings does result in a lower resource utilisation, most notably when there are an odd number runable vcpus in a domain, as the other thread is forced to idle. A step beyond this is core-aware scheduling, where we schedule in units of a virtual core rather than a virtual thread. This has much better behaviour from the guests point of view, as the actually-scheduled topology remains consistent, but does potentially come with even lower utilisation if every other thread in the guest is idle. A side requirement for core-aware scheduling is for Xen to have an accurate idea of the topology presented to the guest. I need to dust off my Toolstack CPUID/MSR improvement series and get that upstream. One of the most insidious problems with L1TF is that, with hyperthreading enabled, a malicious guest kernel can engineer arbitrary data leakage by having one thread scanning the expected physical address, and the other thread using an arbitrary cache-load gadget in hypervisor context. This occurs because the L1 data cache is shared by threads. A solution to this issue was proposed, whereby Xen synchronises siblings on vmexit/entry, so we are never executing code in two different privilege levels. Getting this working would make it safe to continue using hyperthreading even in the presence of L1TF. Obviously, its going to come in perf hit, but compared to disabling hyperthreading, all its got to do is beat a 60% perf hit to make it the preferable option for making your system L1TF-proof. Anyway - enough of my rambling for now. Thoughts? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.