[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 01/22/2018 06:39 PM, Andrew Cooper wrote:
> On 22/01/18 16:51, Jan Beulich wrote:
>>>>> On 22.01.18 at 16:00, <jgross@xxxxxxxx> wrote:
>>> On 22/01/18 15:48, Jan Beulich wrote:
>>>>>>> On 22.01.18 at 15:38, <jgross@xxxxxxxx> wrote:
>>>>> On 22/01/18 15:22, Jan Beulich wrote:
>>>>>>>>> On 22.01.18 at 15:18, <jgross@xxxxxxxx> wrote:
>>>>>>> On 22/01/18 13:50, Jan Beulich wrote:
>>>>>>>>>>> On 22.01.18 at 13:32, <jgross@xxxxxxxx> wrote:
>>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor
>>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area.
>>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After
>>>>>>>>> saving the domain's registers stacks are switched back to the normal
>>>>>>>>> per physical cpu ones in order to be able to address on-stack data
>>>>>>>>> from other cpus e.g. while handling IPIs.
>>>>>>>>> Adding %cr3 switching between saving of the registers and switching
>>>>>>>>> the stacks will enable the possibility to run guest code without any
>>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>>>>>>>>> able to access other domains data.
>>>>>>>>> Without any further measures it will still be possible for e.g. a
>>>>>>>>> guest's user program to read stack data of another vcpu of the same
>>>>>>>>> domain, but this can be easily avoided by a little PV-ABI modification
>>>>>>>>> introducing per-cpu user address spaces.
>>>>>>>>> This series is meant as a replacement for Andrew's patch series:
>>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution".
>>>>>>>> Considering in particular the two reverts, what I'm missing here
>>>>>>>> is a clear description of the meaningful additional protection this
>>>>>>>> approach provides over the band-aid. For context see also
>>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>>>>>>> My approach supports mapping only the following data while the guest is
>>>>>>> running (apart form the guest's own data, of course):
>>>>>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>>>>>   guest's registers saved when an interrupt occurs
>>>>>>> - the per-vcpu GDTs and TSSs of the domain
>>>>>>> - the IDT
>>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>>>>> All other hypervisor data and code can be completely hidden from the
>>>>>>> guests.
>>>>>> I understand that. What I'm not clear about is: Which parts of
>>>>>> the additionally hidden data are actually necessary (or at least
>>>>>> very desirable) to hide?
>>>>> Necessary:
>>>>> - other guests' memory (e.g. physical memory 1:1 mapping)
>>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>>>>   code emulator buffers
>>>>> - other guests' register values e.g. in vcpu structure
>>>> All of this is already being made invisible by the band-aid (with the
>>>> exception of leftovers on the hypervisor stacks across context
>>>> switches, which we've already said could be taken care of by
>>>> memset()ing that area). I'm asking about the _additional_ benefits
>>>> of your approach.
>>> I'm quite sure the performance will be much better as it doesn't require
>>> per physical cpu L4 page tables, but just a shadow L4 table for each
>>> guest L4 table, similar to the Linux kernel KPTI approach.
>> But isn't that model having the same synchronization issues upon
>> guest L4 updates which Andrew was fighting with?
> (Condensing a lot of threads down into one)
> All the methods have L4 synchronisation update issues, until we have a
> PV ABI which guarantees that L4's don't get reused.  Any improvements to
> the shadowing/synchronisation algorithm will benefit all approaches.
> Juergen: you're now adding a LTR into the context switch path which
> tends to be very slow.  I.e. As currently presented, this series
> necessarily has a higher runtime overhead than Jan's XPTI.

So here are a repeat of the "hypervisor compile" tests I did, comparing
the different XPTI-like series so far.

# Experimental setup:
 - Intel(R) Xeon(R) CPU E5630  @ 2.53GHz
 - 4 pcpus
 - Memory: 4GiB
 - 4vcpus, 512MiB, blkback to raw file
 - CentOS 6 userspace
 - Linux 4.14 kernel with PV / PVH / PVHVM / KVM guest support (along
with expected drivers) built-in
 - cd xen-4.10.0
 - make -C xen clean
 - time make -j 4 xen

# Results
- In all cases, running a "default" build with CONFIG_DEBUG=n

* Staging, xpti=off
real    1m2.995s
user    2m52.527s
sys     0m40.276s

Result: 63s

* Staging [xpti default]
real    1m27.190s
user    3m3.900s
sys     1m42.686s

Result: 87s (38% overhead)

Note also that the "system time" here is about 2.5x of "xpti=off"; so
total wasted cpu time is significantly higher.

* Staging + "x86: slightly reduce Meltdown band-aid overhead"
real    1m21.661s
user    3m3.809s
sys     1m25.344s

Result: 81s (28% overhead)

NB that the "system time" here is significantly reduced from above, but
still nearly double of the "system time" for plain PV

* Above + "x86: reduce Meltdown band-aid overhead a little further"
real    1m21.357s
user    3m3.284s
sys     1m25.379s

Result: 81s (28% overhead)

No real change

* Staging + Juergen's v2 series
real    1m3.018s
user    2m52.217s
sys     0m40.357s

Result: 63s (0% overhead)

Unfortunately, I can't really verify that Juergen's patches are having
any effect; there's no printk indicating whether it's enabling the
mitigation or not.  I have verified that the changeset reported in `xl
dmesg` corresponds to the branch I have with the patches applied.

So it's *possible* something has gotten mixed up, and the mitigation
isn't being applied; but if it *is* applied, the performance is
significantly better than the "band-aid" XPTI.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.