[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/domain_page: implement pure per-vCPU mapping infrastructure



On Tue, 2020-02-04 at 12:08 +0000, Wei Liu wrote:
> Thanks, I welcome effort to make Xen more scalable. :-) 
> 
> On Mon, Feb 03, 2020 at 06:36:53PM +0000, Hongyan Xia wrote:
> > Rewrite the mapcache to be purely per-vCPU instead of partly per-
> > vCPU
> > and partly per-domain.
> > 
> > This patch is needed to address performance issues when we start
> > relying
> > on the mapcache, e.g., when we do not have a direct map. Currently,
> > the
> > per-domain lock on the mapcache is a bottleneck for multicore,
> > causing
> > performance degradation and even functional regressions. This patch
> > makes the mapping structure per-vCPU and completely lockless.
> > 
> 
> When I see "regression", I think of something that was working before
> but not anymore. I don't think that's the case for the following two
> things. 
> 
> I would just classify them as bug and/or improvement.

We probably have different definitions for "regression"... but I can
reword.

> > Functional regression:
> > 
> > When a domain is run on more than 64 cores, FreeBSD 10 panicks
> > frequently
> > due to occasional simultaneous set_singleshot_timer hypercalls from
> > too
> > many cores. Some cores will be blocked waiting on map_domain_page,
> > eventually failing to set a timer in the future. FreeBSD cannot
> > handle
> > this and panicks. This was fixed in later FreeBSD releases by
> > handling
> > -ETIME, but still the degradation in timer performance is a big
> > issue.
> > 
> > Performance regression:
> > 
> > Many benchmarks see a performance drop when having a large core
> > count.
> > I have done a Geekbench on a 32-vCPU guest.
> > 
> > perf drop     old        new
> > single       0.04%      0.18%
> > multi        2.43%      0.08%
> > 
> > Removing the per-domain lock in the mapcache brings the multi-core
> > performance almost identical to using the direct map for mappings.
> > 
> > There should be room for futher optimisations, but this already
> > improves over the old mapcache by a lot.
> > 
> > Note that entries in the maphash will occupy inuse slots. With 16
> > slots
> > per vCPU and a maphash capacity of 8, we only have another 8
> > available,
> > which is not enough for nested page table walks. We need to
> > increase the
> > number of slots in config.h.
> > 
> > Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx>
> 
> 
> As far as I can tell all vcpus still share the same per-domain
> mapcache
> region. The difference is now the region is divided into subregion
> for
> each vcpu to use.

You are right. We have a per-domain VA range and we have
create_perdomain_mappings which can be reused nicely. The patch divides
the regions into vCPUs.

> 
> > ---
> >  xen/arch/x86/domain.c        |   5 +-
> >  xen/arch/x86/domain_page.c   | 229 +++++++++++------------------
> > ------
> >  xen/include/asm-x86/config.h |   2 +-
> >  xen/include/asm-x86/domain.h |  30 +----
> >  4 files changed, 80 insertions(+), 186 deletions(-)
> > 
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index f53ae5ff86..a278aa4678 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -445,6 +445,9 @@ void arch_vcpu_destroy(struct vcpu *v)
> >      xfree(v->arch.msrs);
> >      v->arch.msrs = NULL;
> >  
> > +    xfree(v->arch.pv.mapcache);
> > +    v->arch.pv.mapcache = NULL;
> > +
> 
> Keep in mind that accessing the union {pv, hvm} before knowing the
> exact variant is dangerous.
> 
> Because mapcache is initialised for PV only, so it should be freed
> for
> PV only. I think you need to put it to pv_vcpu_destroy.
> 
> If you don't want to do that because of asymmetry with
> mapcache_vcpu_init, you may want to invent mapcache_vcpu_destroy for
> your purpose.

Ah right this is a problem. I was working on a tree where everyone has
a mapcache, which is not true for current upstream when I cherry-
picked. Will fix.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.