[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 7/7] xen/x86: use PCID feature



>>> On 23.03.18 at 12:29, <jgross@xxxxxxxx> wrote:
> On 23/03/18 11:51, Jan Beulich wrote:
>>>>> On 21.03.18 at 13:51, <jgross@xxxxxxxx> wrote:
>>> Avoid flushing the complete TLB when switching %cr3 for mitigation of
>>> Meltdown by using the PCID feature if available.
>>>
>>> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
>>> 2 values for the non-XPTI case:
>>>
>>> - guest active and in kernel mode
>>> - guest active and in user mode
>>> - hypervisor active and guest in user mode (XPTI only)
>>> - hypervisor active and guest in kernel mode (XPTI only)
>> 
>> Before committing to this route, Jun, Kevin, can we please get
>> confirmation that PCID isn't (and isn't going to be) subject to the
>> same speculation issues in the pipeline that the U/S bit is (having
>> caused Meltdown in the first place)? To me it seems a pretty
>> likely thing to play all the same games.
> 
> Really? This would assume either the processor is capable to deal with
> multiple matching TLB entries when speculating or that there can be
> only one entry per virtual address present in the TLB at the same time
> in spite of different PCIDs.

Hmm, yes, good point.

> And why aren't you asking for the same problem with VPIDs? This should
> be comparable to the PCID problem you are suspecting.

Since Linux is using the approach, I'm not really suspecting a
problem. I'd just like to be sure there is none / not going to be
one. None of this is spelled out in the doc after all.

>>> ---
>>>  docs/misc/xen-command-line.markdown | 12 +++++++++
>>>  xen/arch/x86/debug.c                |  3 ++-
>>>  xen/arch/x86/domain_page.c          |  2 +-
>>>  xen/arch/x86/domctl.c               |  4 +++
>>>  xen/arch/x86/flushtlb.c             | 49 ++++++++++++++++++++++++++++------
>>>  xen/arch/x86/mm.c                   | 34 +++++++++++++++++++++---
>>>  xen/arch/x86/pv/dom0_build.c        |  1 +
>>>  xen/arch/x86/pv/domain.c            | 52 
>>> +++++++++++++++++++++++++++++++++++++
>>>  xen/include/asm-x86/domain.h        | 14 +++++++---
>>>  xen/include/asm-x86/pv/domain.h     |  2 ++
>>>  xen/include/asm-x86/x86-defns.h     |  1 +
>>>  11 files changed, 158 insertions(+), 16 deletions(-)
>> 
>> Having had the discussion previously, I'm missing a change to
>> smp.c:new_tlbflush_clock_period() here.
> 
> I can add that. I did not do this as I haven't treated FLUSH_TLB
> differently to FLUSH_TLB_GLOBAL (trying this even without any other
> change to staging led to degfaults in dom0). So such a change to
> new_tlbflush_clock_period() should be a separate patch I believe.

Ah yes, you don't have the "flushing too much" issue anymore in
this version, or at least not in as obvious a way. Or wait - it's in
patch 4. You still do an including-global flush there regardless of
whether FLUSH_TLB_GLOBAL was actually requested. Anyway,
we'll save this aspect for later.

>>> --- a/xen/arch/x86/debug.c
>>> +++ b/xen/arch/x86/debug.c
>>> @@ -97,7 +97,8 @@ dbg_pv_va2mfn(dbgva_t vaddr, struct domain *dp, uint64_t 
>>> pgd3val)
>>>      l3_pgentry_t l3e, *l3t;
>>>      l2_pgentry_t l2e, *l2t;
>>>      l1_pgentry_t l1e, *l1t;
>>> -    unsigned long cr3 = (pgd3val ? pgd3val : dp->vcpu[0]->arch.cr3);
>>> +    unsigned long cr3 = (pgd3val ? pgd3val
>>> +                                 : (dp->vcpu[0]->arch.cr3 & 
>>> ~X86_CR3_NOFLUSH));
>> 
>> What about the PCID portion? You want the address of the page
>> here, so I think you should use a "white-listing" masking operation
>> instead of a blacklisting one.
> 
> The PCID portion is no problem here as the value is converted into a
> mfn.
> 
> I can do the modification you are asking for, of course.

Well, even if the bits are shifter out in the end, the code could
look more correct. Plus by masking to just the address, future
new meaning assigned to currently unused bits would not be
a problem for this piece of code anymore.

>>> +/*
>>> + * Return additional PCID specific cr3 bits.
>>> + *
>>> + * Note that X86_CR3_NOFLUSH will not be readable in cr3. Anyone consuming
>>> + * v->arch.cr3 should mask away X86_CR3_NOFLUSH and X86_CR3_PCIDMASK in 
>>> case
>> 
>> I stand to my previous comment, which was left unanswered afaics:
> 
> Uuh, sorry for that.
> 
>> "Is it a good idea to suppress the flush this way for every reader
>>  of v->arch.cr3? For example, what about the use in
>>  dbg_pv_va2mfn()? I think it should be the consumers of the field
>>  to decide whether to OR in that flag (which isn't part of the
>>  register value anyway)."
>> To be more precise, I can see the pcid to be put here (which will
>> require consumers to be aware anyway), but I don't think the non-
>> register-value no-flush indicator belongs here. IOW I think after
>> writing the value into %cr3, the value read back should match the
>> stored value.
> 
> This will make restore_all_guest more complicated. v->arch.cr3 is copied
> to cpu_info->xen_cr3 there and this value is then used for %cr3. I
> really don't want to add complex logic there to add the no-flush
> indicator in case PCIDs are active.

Valid point. Looking at all present uses of ->arch.cr3, it's probably
indeed better the way you have it. However, I'm now wondering
about something else: make_cr3() leaves PCID as zero for HVM
and idle domains, but runs Xen with PCIDs 2 and 3 for (some) PV
domains. That looks like an undesirable setup though - it would
seem better to run Xen (with full page tables) with PCID 0 at all
times.

Then we'd have e.g.
PCID 0  Xen (full page tables)
PCID x  PV guest priv
PCID y  PV guest user

Global pages in PCID 0 could then still be permitted, and wouldn't
ever need flushing except when FLUSH_TLB_GLOBAL is requested.

As to the use of two separate PCIDs for PV kernel and user modes
- while this helps isolation, it prevents recovering the non-XPTI
property of user mode TLB entries surviving in-guest mode switches.
I wonder whether this is part of the reason you see PCID have a
negative effect in the non-XPTI case.

So in the end the question is: Why not use just two PCIDs, and
allow global pages just like we do now, with the added benefit
that we no longer need to flush Xen's global TLB entries just
because we want to get rid of PV guest user ones.

>>> +        }
>>> +        if ( d->arch.pv_domain.pcid )
>>> +            v->arch.cr3 |= get_pcid_bits(v, d->arch.pv_domain.xpti);
>> 
>> It is certainly at least confusing that you pass "xpti" as argument
>> for a parameter named "is_xen". The question is what mode you
>> want us to be in when running with PCID but no XPTI: Should Xen
>> use its own PCID then? That would seem reasonable to me, but
>> would seem to require passing true here. Yet then this would
>> require switching CR3 on the way out to guests and back in from
>> guests even in that case (just without copying the root page table).
>> 
>> If in that mode Xen isn't meant to use its own PCID, the command
>> line option "pcid=noxpti" would seem at least misleading to me then,
>> as you wouldn't really use different PCIDs in that mode, but only
>> disable global pages (which probably hurts performance) and use
>> INVPCID for flushing (which probably helps performance).
> 
> The idea is to use different PCID values for guest kernel and user
> mode. This removes the need for global guest user pages.

Partly only. Global guest user pages serve two purposes: They
survive a round trip user -> kernel -> user (as long as no Xen
context switch occurs in the middle), and they also allow the
kernel to utilize TLB entries user mode has just created e.g. for
system call input. Your current use of PCIDs retains only the
first property.

> I don't
> want to use global guest user pages together with PCID as flushing
> global pages from the TLB with PCID enabled requires flushing either
> the complete TLB or you'd have to use INVLPG in all possible address
> spaces (so you'd need to have multiple %cr3 switches).

Well, yes, flushing _individual_ pages is a problem in that case.
As to multiple CR3 switches - are these all that bad really with
the no-flush bit set? With the reduced number of PCIDs in actual
use (as discussed above) "all possible address spaces" would
mean just two. And I could imagine that in a number of cases
just one INVLPG (with the right PCID active) might suffice.

One complicating factor is that we don't want to introduce
Xen TLB entries for other than what we map in the minimal page
tables into PV guest PCID space, which would happen if we
simply switched PCID around an INVLPG.

What I don't understand in any event is why you need separate
PCIDs for Xen depending on whether the active PV guest is in
kernel or user mode.

As a side note, I've noticed only now that get_pcid_bits()'s
first parameter wants to be constified.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.