[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

On 23/01/18 12:45, Andrew Cooper wrote:
> On 23/01/18 10:10, Juergen Gross wrote:
>> On 23/01/18 10:31, Jan Beulich wrote:
>>>>>> On 23.01.18 at 10:24, <jgross@xxxxxxxx> wrote:
>>>> On 23/01/18 09:53, Jan Beulich wrote:
>>>>>>>> On 23.01.18 at 07:34, <jgross@xxxxxxxx> wrote:
>>>>>> On 22/01/18 19:39, Andrew Cooper wrote:
>>>>>>> One of my concerns is that this patch series moves further away from the
>>>>>>> secondary goal of my KAISER series, which was to have the IDT and GDT
>>>>>>> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
>>>>>>> leak which CPU you're currently scheduled on into PV guests and b) the
>>>>>>> context switch code can drop a load of its slow instructions like LGDT
>>>>>>> and the VMWRITEs to update the VMCS.
>>>>>> The GDT address of a PV vcpu is depending on vcpu_id only. I don't
>>>>>> see why the IDT can't be mapped to the same address on each cpu with
>>>>>> my approach.
>>>>> You're not introducing a per-CPU range in the page tables afaics
>>>>> (again from overview and titles only), yet with the IDT needing
>>>>> to be per-CPU you'd also need a per-CPU range to map it to if
>>>>> you want to avoid the LIDT as well as exposing what CPU you're
>>>>> on (same goes for the GDT and the respective avoidance of LGDT
>>>>> afaict).
>>>> After a quick look I don't see why a Meltdown mitigation can't use
>>>> the same IDT for all cpus: the only reason I could find for having
>>>> per-cpu IDTs seems to be in SVM code, so it seems to be AMD specific.
>>>> And AMD won't need XPTI at all.
>>> Isn't your RFC series allowing XPTI to be enabled even on AMD?
>> Yes, you are right. This might either want to be revisited or the
>> address space to be activated for SVM domains could map an IDT with
>> IST related traps removed.
> I've experimented quite a lot in this area.  Ideally, we'd vmload/save
> in the SVM critical region (like all other hypervisors) at which point
> we don't need any adjustments to the IDT (as IST references are safe to
> use), and we'd catch stack overflows in the #DF handler rather than
> immediately triple faulting.
> Using LIDT to switch between alternative IDTs, or INVLPG to swap the
> mapping under a fixed linear address are both much slower than the
> current implementation.
>>>> The GDT of pv domains is already in the per-domain region even without
>>>> my patches, so I don't have to change anything regarding usage of LGDT.
>>> Andrew's point was that eliminating the LGDT is a secondary goal.
>> With per-cpu mappings this is surely an obvious optimization. In the
>> end the overall performance should be taken as base for a decision.
>> His main point was avoiding exposing data like the physical cpu number
>> and this doesn't apply here, as the GDT is per vcpu in my case.
> The GDT leaks vcpu_id into guest userspace, which is similarly problematic.

Mind explaining this? Why is leaking the vcpu_id problematic?

> The secondary goals of my KAISER series stand irrespective of the
> Meltdown issues:
> * The stack and mutable critical structures really should be numa-local
> to the CPU using it.
> * The GDT should sit fully fat over zeros.  At the moment in HVM
> context, there are 14 frames of arbitrary directmap living within the
> GDT limit.
> * The IDT/GDT should exist at the same linear address on every pcpu to
> avoid leaking information  (This property is what allows the removal of
> the lgdt from the context switch path).
> * The critical datastructures should be mapped read only to make
> exploitation hardware for an attacker with a write-primative.
> * With the stack at the same linear address on each CPU, we don't need
> the syscall stubs, and the TSS is identical on all cpus.
> In some copious free time, it would be nice to fix these issues.

As long as you can't solve the primary performance problem of your
approach for existing pv guests I don't see why above tuning attempts
would make any sense.

I know for sure there are users out there not capable to switch to HVM
or PVH guests because they need more than 64 vcpus per guest. So before
tackling above problems you really have to solve the large HVM guest
problem. And making it impossible for those users to continue using
PV guests by hitting performance so bad won't be an accepted "solution".


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.