Xen project Mailing List

Re: [Xen-devel] [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER

To: David Vrabel <david.vrabel@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Fri, 4 Dec 2015 11:00:36 +0000

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Fri, 04 Dec 2015 11:01:19 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/12/15 16:42, David Vrabel wrote: > If a guest allocates a page and the tlbflush_timestamp on the page > indicates that a TLB flush of the previous owner is required, only the > linear and combined mappings are invalidated. The guest-physical > mappings are not invalidated. > > This is currently safe because the EPT code ensures that the > guest-physical and combined mappings are invalidated /before/ the page > is freed. However, this prevents us from deferring the EPT invalidate > until after the page is freed (e.g., to defer the invalidate until the > p2m locks are released). > > The TLB flush that may be done after allocating page already causes > the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if > one is still pending. > > ept_sync_domain() now marks all PCPUs as needing to be invalidated, > including PCPUs that the domain has not run on. We still only IPI > those PCPUs that are active so this does not result in any more[1] > INVEPT calls. > > We do not attempt to track when PCPUs may have cached translations > because the only safe way to clear this per-CPU state if immediately > after an invalidate the PCPU is not active (i.e., the PCPU is not in > d->domain_dirty_cpumask). Since we only invalidate on VMENTER or by > IPIing active PCPUs this can never happen. > > [1] There is one unnecessary INVEPT when the domain runs on a PCPU for > the very first time. But this is: a) only 1 additional invalidate > per PCPU for the lifetime of the domain; and b) we can avoid it > with a subsequent change. > > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> This looks like a definite improvement. One thing you missed is that in ept_p2m_init(), it calls __ept_sync_domain() on all cpus; but because the "invalidate" mask is clear at that point, nothing will actually happen. Part of this I think comes as a result from inverting what the mask means. Before this patch, "not synced" is the default, and therefore you need to invalidate unless someone has set the bit saying you don't have to. After this, "don't invalidate" is the default and you do nothing unless someone has set a bit saying you do have to. I'd think prefer it if you left the mask as "synced_mask"; then you can actually take that on_each_cpu() out of the ept_p2m_init entirely. (Probably wants a comment pointing that out.) -George > --- > xen/arch/x86/hvm/vmx/vmx.c | 22 ++++++++++------------ > xen/arch/x86/mm/p2m-ept.c | 20 ++++++++------------ > xen/include/asm-x86/hvm/vmx/vmcs.h | 3 +-- > 3 files changed, 19 insertions(+), 26 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 9ad6d82..2f3a9f1 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -738,24 +738,12 @@ static void vmx_ctxt_switch_from(struct vcpu *v) > > static void vmx_ctxt_switch_to(struct vcpu *v) > { > - struct domain *d = v->domain; > unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features; > - struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept; > > /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */ > if ( old_cr4 != new_cr4 ) > write_cr4(new_cr4); > > - if ( paging_mode_hap(d) ) > - { > - unsigned int cpu = smp_processor_id(); > - /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */ > - if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) && > - !cpumask_test_and_set_cpu(cpu, > - ept_get_synced_mask(ept_data)) ) > - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); > - } > - > vmx_restore_guest_msrs(v); > vmx_restore_dr(v); > } > @@ -3497,6 +3485,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs > *regs) > if ( unlikely(need_flush) ) > vpid_sync_all(); > > + if ( paging_mode_hap(curr->domain) ) > + { > + struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept; > + unsigned int cpu = smp_processor_id(); > + > + if ( cpumask_test_cpu(cpu, ept->invalidate) > + && cpumask_test_and_clear_cpu(cpu, ept->invalidate) ) > + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0); > + } > + > out: > HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0); > > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c > index eef0372..014e2b2 100644 > --- a/xen/arch/x86/mm/p2m-ept.c > +++ b/xen/arch/x86/mm/p2m-ept.c > @@ -1090,8 +1090,10 @@ static void ept_memory_type_changed(struct p2m_domain > *p2m) > static void __ept_sync_domain(void *info) > { > struct ept_data *ept = &((struct p2m_domain *)info)->ept; > + unsigned int cpu = smp_processor_id(); > > - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0); > + if ( cpumask_test_and_clear_cpu(cpu, ept->invalidate) ) > + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0); > } > > void ept_sync_domain(struct p2m_domain *p2m) > @@ -1107,16 +1109,10 @@ void ept_sync_domain(struct p2m_domain *p2m) > if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) ) > p2m_flush_nestedp2m(d); > > - /* > - * Flush active cpus synchronously. Flush others the next time this > domain > - * is scheduled onto them. We accept the race of other CPUs adding to > - * the ept_synced mask before on_selected_cpus() reads it, resulting in > - * unnecessary extra flushes, to avoid allocating a cpumask_t on the > stack. > - */ > - cpumask_and(ept_get_synced_mask(ept), > - d->domain_dirty_cpumask, &cpu_online_map); > + /* May need to invalidate on all PCPUs. */ > + cpumask_setall(ept->invalidate); > > - on_selected_cpus(ept_get_synced_mask(ept), > + on_selected_cpus(d->domain_dirty_cpumask, > __ept_sync_domain, p2m, 1); > } > > @@ -1182,7 +1178,7 @@ int ept_p2m_init(struct p2m_domain *p2m) > p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers; > } > > - if ( !zalloc_cpumask_var(&ept->synced_mask) ) > + if ( !zalloc_cpumask_var(&ept->invalidate) ) > return -ENOMEM; > > on_each_cpu(__ept_sync_domain, p2m, 1); > @@ -1193,7 +1189,7 @@ int ept_p2m_init(struct p2m_domain *p2m) > void ept_p2m_uninit(struct p2m_domain *p2m) > { > struct ept_data *ept = &p2m->ept; > - free_cpumask_var(ept->synced_mask); > + free_cpumask_var(ept->invalidate); > } > > static void ept_dump_p2m_table(unsigned char key) > diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h > b/xen/include/asm-x86/hvm/vmx/vmcs.h > index b3b0946..fbcb811 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -67,7 +67,7 @@ struct ept_data { > }; > u64 eptp; > }; > - cpumask_var_t synced_mask; > + cpumask_var_t invalidate; > }; > > #define _VMX_DOMAIN_PML_ENABLED 0 > @@ -97,7 +97,6 @@ struct pi_desc { > #define ept_get_wl(ept) ((ept)->ept_wl) > #define ept_get_asr(ept) ((ept)->asr) > #define ept_get_eptp(ept) ((ept)->eptp) > -#define ept_get_synced_mask(ept) ((ept)->synced_mask) > > #define NR_PML_ENTRIES 512 > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.