Xen project Mailing List

Re: [Xen-devel] Ongoing/future speculative mitigation work

To: Wei Liu <wei.liu2@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Mon, 10 Dec 2018 12:19:18 +0000

Autocrypt: addr=george.dunlap@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFPqG+MBEACwPYTQpHepyshcufo0dVmqxDo917iWPslB8lauFxVf4WZtGvQSsKStHJSj 92Qkxp4CH2DwudI8qpVbnWCXsZxodDWac9c3PordLwz5/XL41LevEoM3NWRm5TNgJ3ckPA+J K5OfSK04QtmwSHFP3G/SXDJpGs+oDJgASta2AOl9vPV+t3xG6xyfa2NMGn9wmEvvVMD44Z7R W3RhZPn/NEZ5gaJhIUMgTChGwwWDOX0YPY19vcy5fT4bTIxvoZsLOkLSGoZb/jHIzkAAznug Q7PPeZJ1kXpbW9EHHaUHiCD9C87dMyty0N3TmWfp0VvBCaw32yFtM9jUgB7UVneoZUMUKeHA fgIXhJ7I7JFmw3J0PjGLxCLHf2Q5JOD8jeEXpdxugqF7B/fWYYmyIgwKutiGZeoPhl9c/7RE Bf6f9Qv4AtQoJwtLw6+5pDXsTD5q/GwhPjt7ohF7aQZTMMHhZuS52/izKhDzIufl6uiqUBge 0lqG+/ViLKwCkxHDREuSUTtfjRc9/AoAt2V2HOfgKORSCjFC1eI0+8UMxlfdq2z1AAchinU0 eSkRpX2An3CPEjgGFmu2Je4a/R/Kd6nGU8AFaE8ta0oq5BSFDRYdcKchw4TSxetkG6iUtqOO ZFS7VAdF00eqFJNQpi6IUQryhnrOByw+zSobqlOPUO7XC5fjnwARAQABzSRHZW9yZ2UgVy4g RHVubGFwIDxkdW5sYXBnQHVtaWNoLmVkdT7CwYAEEwEKACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAlpk2IEFCQo9I54ACgkQpjY8MQWQtG1A1BAAnc0oX3+M/jyv4j/ESJTO U2JhuWUWV6NFuzU10pUmMqpgQtiVEVU2QbCvTcZS1U/S6bqAUoiWQreDMSSgGH3a3BmRNi8n HKtarJqyK81aERM2HrjYkC1ZlRYG+jS8oWzzQrCQiTwn3eFLJrHjqowTbwahoiMw/nJ+OrZO /VXLfNeaxA5GF6emwgbpshwaUtESQ/MC5hFAFmUBZKAxp9CXG2ZhTP6ROV4fwhpnHaz8z+BT NQz8YwA4gkmFJbDUA9I0Cm9D/EZscrCGMeaVvcyldbMhWS+aH8nbqv6brhgbJEQS22eKCZDD J/ng5ea25QnS0fqu3bMrH39tDqeh7rVnt8Yu/YgOwc3XmgzmAhIDyzSinYEWJ1FkOVpIbGl9 uR6seRsfJmUK84KCScjkBhMKTOixWgNEQ/zTcLUsfTh6KQdLTn083Q5aFxWOIal2hiy9UyqR VQydowXy4Xx58rqvZjuYzdGDdAUlZ+D2O3Jp28ez5SikA/ZaaoGI9S1VWvQsQdzNfD2D+xfL qfd9yv7gko9eTJzv5zFr2MedtRb/nCrMTnvLkwNX4abB5+19JGneeRU4jy7yDYAhUXcI/waS /hHioT9MOjMh+DoLCgeZJYaOcgQdORY/IclLiLq4yFnG+4Ocft8igp79dbYYHkAkmC9te/2x Kq9nEd0Hg288EO/OwE0EVFq6vQEIAO2idItaUEplEemV2Q9mBA8YmtgckdLmaE0uzdDWL9To 1PL+qdNe7tBXKOfkKI7v32fe0nB4aecRlQJOZMWQRQ0+KLyXdJyHkq9221sHzcxsdcGs7X3c 17ep9zASq+wIYqAdZvr7pN9a3nVHZ4W7bzezuNDAvn4EpOf/o0RsWNyDlT6KECs1DuzOdRqD oOMJfYmtx9hMzqBoTdr6U20/KgnC/dmWWcJAUZXaAFp+3NYRCkk7k939VaUpoY519CeLrymd Vdke66KCiWBQXMkgtMGvGk5gLQLy4H3KXvpXoDrYKgysy7jeOccxI8owoiOdtbfM8TTDyWPR Ygjzb9LApA8AEQEAAcLBZQQYAQoADwIbDAUCWmTXMwUJB+tP9gAKCRCmNjwxBZC0bb+2D/9h jn1k5WcRHlu19WGuH6q0Kgm1LRT7PnnSz904igHNElMB5a7wRjw5kdNwU3sRm2nnmHeOJH8k Yj2Hn1QgX5SqQsysWTHWOEseGeoXydx9zZZkt3oQJM+9NV1VjK0bOXwqhiQyEUWz5/9l467F S/k4FJ5CHNRumvhLa0l2HEEu5pxq463HQZHDt4YE/9Y74eXOnYCB4nrYxQD/GSXEZvWryEWr eDoaFqzq1TKtzHhFgQG7yFUEepxLRUUtYsEpT6Rks2l4LCqG3hVD0URFIiTyuxJx3VC2Ta4L H3hxQtiaIpuXqq2D4z63h6vCx2wxfZc/WRHGbr4NAlB81l35Q/UHyMocVuYLj0llF0rwU4Aj iKZ5qWNSEdvEpL43fTvZYxQhDCjQTKbb38omu5P4kOf1HT7s+kmQKRtiLBlqHzK17D4K/180 ADw7a3gnmr5RumcZP3NGSSZA6jP5vNqQpNu4gqrPFWNQKQcW8HBiYFgq6SoLQQWbRxJDHvTR YJ2ms7oCe870gh4D1wFFqTLeyXiVqjddENGNaP8ZlCDw6EU82N8Bn5LXKjR1GWo2UK3CjrkH pTt3YYZvrhS2MO2EYEcWjyu6LALF/lS6z6LKeQZ+t9AdQUcILlrx9IxqXv6GvAoBLJY1jjGB q+/kRPrWXpoaQn7FXWGfMqU+NkY9enyrlw==

Cc: Martin Pohlack <mpohlack@xxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Joao Martins <joao.m.martins@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Daniel Kiper <daniel.kiper@xxxxxxxxxx>, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Anthony Liguori <aliguori@xxxxxxxxxx>, "Dannowski, Uwe" <uwed@xxxxxxxxx>, Lars Kurth <lars.kurth@xxxxxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, Ross Philipson <ross.philipson@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Juergen Gross <JGross@xxxxxxxx>, Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>, Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>, "Woodhouse, David" <dwmw@xxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 10 Dec 2018 12:19:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 12/10/18 12:12 PM, George Dunlap wrote: > On 12/7/18 6:40 PM, Wei Liu wrote: >> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote: >>> Hello, >>> >>> This is an accumulation and summary of various tasks which have been >>> discussed since the revelation of the speculative security issues in >>> January, and also an invitation to discuss alternative ideas. They are >>> x86 specific, but a lot of the principles are architecture-agnostic. >>> >>> 1) A secrets-free hypervisor. >>> >>> Basically every hypercall can be (ab)used by a guest, and used as an >>> arbitrary cache-load gadget. Logically, this is the first half of a >>> Spectre SP1 gadget, and is usually the first stepping stone to >>> exploiting one of the speculative sidechannels. >>> >>> Short of compiling Xen with LLVM's Speculative Load Hardening (which is >>> still experimental, and comes with a ~30% perf hit in the common case), >>> this is unavoidable. Furthermore, throwing a few array_index_nospec() >>> into the code isn't a viable solution to the problem. >>> >>> An alternative option is to have less data mapped into Xen's virtual >>> address space - if a piece of memory isn't mapped, it can't be loaded >>> into the cache. >>> >>> An easy first step here is to remove Xen's directmap, which will mean >>> that guests general RAM isn't mapped by default into Xen's address >>> space. This will come with some performance hit, as the >>> map_domain_page() infrastructure will now have to actually >>> create/destroy mappings, but removing the directmap will cause an >>> improvement for non-speculative security as well (No possibility of >>> ret2dir as an exploit technique). >>> >>> Beyond the directmap, there are plenty of other interesting secrets in >>> the Xen heap and other mappings, such as the stacks of the other pcpus. >>> Fixing this requires moving Xen to having a non-uniform memory layout, >>> and this is much harder to change. I already experimented with this as >>> a meltdown mitigation around about a year ago, and posted the resulting >>> series on Jan 4th, >>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html, >>> some trivial bits of which have already found their way upstream. >>> >>> To have a non-uniform memory layout, Xen may not share L4 pagetables. >>> i.e. Xen must never have two pcpus which reference the same pagetable in >>> %cr3. >>> >>> This property already holds for 32bit PV guests, and all HVM guests, but >>> 64bit PV guests are the sticking point. Because Linux has a flat memory >>> layout, when a 64bit PV guest schedules two threads from the same >>> process on separate vcpus, those two vcpus have the same virtual %cr3, >>> and currently, Xen programs the same real %cr3 into hardware. >>> >>> If we want Xen to have a non-uniform layout, are two options are: >>> * Fix Linux to have the same non-uniform layout that Xen wants >>> (Backwards compatibility for older 64bit PV guests can be achieved with >>> xen-shim). >>> * Make use XPTI algorithm (specifically, the pagetable sync/copy part) >>> forever more in the future. >>> >>> Option 2 isn't great (especially for perf on fixed hardware), but does >>> keep all the necessary changes in Xen. Option 1 looks to be the better >>> option longterm. >>> >>> As an interesting point to note. The 32bit PV ABI prohibits sharing of >>> L3 pagetables, because back in the 32bit hypervisor days, we used to >>> have linear mappings in the Xen virtual range. This check is stale >>> (from a functionality point of view), but still present in Xen. A >>> consequence of this is that 32bit PV guests definitely don't share >>> top-level pagetables across vcpus. >> >> Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3 >> pagetables can be shared. So guests will schedule the same top-level >> pagetables across vcpus. > >> But, 64bit Xen creates a monitor table for 32bit PAE guest and put the >> CR3 provided by guest to the first slot, so pcpus don't share the same >> L4 pagetables. The property we want still holds. > > Ah, right -- but Xen can get away with this because in PAE mode, "L3" is > just 4 entries that are loaded on CR3-switch and not automatically kept > in sync by the hardware; i.e., the OS already needs to do its own > "manual syncing" if it updates any of the L3 entires; so it's the same > for Xen. > >>> Juergen/Boris: Do you have any idea if/how easy this infrastructure >>> would be to implement for 64bit PV guests as well? If a PV guest can >>> advertise via Elfnote that it won't share top-level pagetables, then we >>> can audit this trivially in Xen. >>> >> >> After reading Linux kernel code, I think it is not going to be trivial. >> As now threads in Linux share one pagetable (as it should be). >> >> In order to make each thread has its own pagetable while still maintain >> the illusion of one address space, there needs to be synchronisation >> under the hood. >> >> There is code in Linux to synchronise vmalloc, but that's only for the >> kernel portion. The infrastructure to synchronise userspace portion is >> missing. >> >> One idea is to follow the same model as vmalloc -- maintain a reference >> pagetable in struct mm and a list of pagetables for threads, then >> synchronise the pagetables in the page fault handler. But this is >> probably a bit hard to sell to Linux maintainers because it will touch a >> lot of the non-Xen code, increase complexity and decrease performance. > > Sorry -- what do you mean "synchronize vmalloc"? If every thread has a > different view of the kernel's vmalloc area, then every thread must have > a different L4 table, right? And if every thread has a different L4 > table, then we've already got the main thing we need from Linux, don't we? Just had an IRL chat with Wei: The syncronization he was talking about was a syncronization *of the kernel space* *between procesess*. What we would need in Linux is a synchronization *of userspace* *between threads*. So the same basic idea is there, but it would require a reasomable amount of extra extension work. Since the work that would need to be done in Linux is exactly the same work that we'd need to do in Xen, I think the Linux maintainers would be pretty annoyed if we asked them to do it instead of doing it ourselves. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.