[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 7/7] xen/x86: use PCID feature



On 29/03/18 16:19, Jan Beulich wrote:
>>>> On 27.03.18 at 11:07, <jgross@xxxxxxxx> wrote:
>> --- a/xen/arch/x86/domain_page.c
>> +++ b/xen/arch/x86/domain_page.c
>> @@ -51,7 +51,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>>          if ( (v = idle_vcpu[smp_processor_id()]) == current )
>>              sync_local_execstate();
>>          /* We must now be running on the idle page table. */
>> -        ASSERT(read_cr3() == __pa(idle_pg_table));
>> +        ASSERT((read_cr3() & ~X86_CR3_PCID_MASK) == __pa(idle_pg_table));
> 
> This would better use X86_CR3_ADDR_MASK as well.

Right.

> 
>> @@ -102,7 +103,21 @@ void write_cr3_cr4(unsigned long cr3, unsigned long cr4)
>>      t = pre_flush();
>>  
>>      if ( read_cr4() & X86_CR4_PGE )
>> +        /*
>> +         * X86_CR4_PGE set means PCID being inactive.
>> +         * We have to purge the TLB via flipping cr4.pge.
>> +         */
>>          write_cr4(cr4 & ~X86_CR4_PGE);
>> +    else if ( cpu_has_invpcid )
>> +        /*
>> +         * If we are using PCID purge the TLB via INVPCID as loading cr3
>> +         * will affect the current PCID only.
> 
> s/current/new/ ?

Okay.

> 
>> +         * If INVPCID is not supported we don't use PCIDs so loading cr3
>> +         * will purge the TLB (we are in the "global pages off" branch).
>> +         * invpcid_flush_all_nonglobals() seems to be faster than
>> +         * invpcid_flush_all().
>> +         */
>> +        invpcid_flush_all_nonglobals();
>>  
>>      asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" );
> 
> What about the TLB entries that have been re-created between
> the INVPCID and the write of CR3? It's not obvious to me that
> leaving them around is not going to be a problem subsequently,
> the more that you write cr3 frequently with the no-flush bit set.

The no-flush bit should not make any difference here. It controls only
flushing of TLB-entries with the new PCID.

> Don't you need to do a single context invalidation for the prior
> PCID (if different from the new one)?

Hmm, I don't think so, as the mov to %cr3 is a serializing instruction
acting as a speculation barrier. So the only TLB entries which could
survive would be the ones for the few instruction bytes after the
invpcid instruction until the end of the mov to %cr3. Those are harmless
as they are associated with the hypervisor PCID value, so there is no
risk of any data leak to a guest. Maybe a comment explaining that would
be a good idea.

> 
>> @@ -499,7 +500,26 @@ void free_shared_domheap_page(struct page_info *page)
>>  
>>  void make_cr3(struct vcpu *v, mfn_t mfn)
>>  {
>> +    struct domain *d = v->domain;
>> +
>>      v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
>> +    if ( is_pv_domain(d) && d->arch.pv_domain.pcid )
>> +        v->arch.cr3 |= get_pcid_bits(v, false);
>> +}
>> +
>> +unsigned long pv_guest_cr4_to_real_cr4(struct vcpu *v)
> 
> const

Okay (for the other cases, too).

> 
>> +{
>> +    struct domain *d = v->domain;
> 
> again
> 
>> @@ -298,11 +362,21 @@ int pv_domain_initialise(struct domain *d)
>>  
>>  static void _toggle_guest_pt(struct vcpu *v, bool force_cr3)
>>  {
>> +    struct domain *d = v->domain;
> 
> and one more
> 
>> --- a/xen/include/asm-x86/x86-defns.h
>> +++ b/xen/include/asm-x86/x86-defns.h
>> @@ -45,7 +45,9 @@
>>  /*
>>   * Intel CPU flags in CR3
>>   */
>> -#define X86_CR3_NOFLUSH (_AC(1, ULL) << 63)
>> +#define X86_CR3_NOFLUSH    (_AC(1, ULL) << 63)
>> +#define X86_CR3_ADDR_MASK  (PAGE_MASK & ~X86_CR3_NOFLUSH)
> 
> This would better be PAGE_MASK & PADDR_MASK: bits 52...62
> are reserved (the respective figure in chapter 2 of the SDM is not to
> be trusted, the tables in the "4-level paging" section are more likely to
> be correct).

Okay.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.