[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] x86/hvm: improve performance of HVMOP_flush_tlbs
On Fri, Dec 27, 2019 at 02:52:17PM +0000, Andrew Cooper wrote: > On 24/12/2019 13:26, Roger Pau Monne wrote: > > There's no need to call paging_update_cr3 unless CR3 trapping is > > enabled, and that's only the case when using shadow paging or when > > requested for introspection purposes, otherwise there's no need to > > pause all the vCPUs of the domain in order to perform the flush. > > > > Check whether CR3 trapping is currently in use in order to decide > > whether the vCPUs should be paused, otherwise just perform the flush. > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > I agree that the existing logic poor, but this direction looks to be > even more fragile. > > Instead, I think it would be better to follow the EPT invalidation > example; mark all vcpus as needing a tlb flush, and IPI the domain dirty > mask, having the return-to-guest path do the flushing. AFAICT there's no need to call the tlb flush, the vmexit/vmentry itself will perform the necessary flushes, so the only requirement is to IPI the pCPUs in order to force a vmexit. > This avoids all vcpu pausing/unpausing activities, and the cost of the > flush is incurred by the target vcpu, rather than the vcpu making the > hypercall accumulate the cost for everything, as well as a large amount > of remote VMCS accesses. Hm, then we would need a way to pin the vCPUs to the pCPUs they are running on, or else in the introspection-enabled case you could end up calling paging_update_cr3 on vCPUs of other domains (maybe that's fine, but it could mess up with introspection I guess). AFAICT the call to paging_update_cr3 needs to be done from hvm_flush_vcpu_tlb or else we would have to freeze the scheduler so that vCPUs don't move around pCPUs (or get de-scheduled), I think we still need the pause in the introspection case, but the open coded pause loop could be replaced with domain_pause_except_self. > It can probably also remove the need for the flush_vcpu() callback which > is going to be expensive due to retpoline, and whose contents are trivial. I was planning to look into this, but wanted to send this version first since it's already a big improvement in terms of performance. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |