[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction" [and 2 more messages]



Roger Pau Monné writes ("Re: [PATCH for-4.16] Revert "domctl: improve locking 
during domain destruction""):
> On Tue, Nov 09, 2021 at 03:04:56PM +0000, Ian Jackson wrote:
> > So I am going to treat this as an effectively new change.
> > 
> > AIUI it is a proposal to improve performance, not a bugfix.  Was this
> > change posted (or, proposed on-list) before the Xen 4.16 Last Posting
> > Date (24th of September) ?  Even if it was, it would need a freeze
> > exception.
> 
> It was posted here:
> 
> https://lore.kernel.org/xen-devel/de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@xxxxxxxxxx/
> 
> Which was missing a spin_barrier, and in a different form here:
> 
> https://lore.kernel.org/xen-devel/2e7044de3cd8a6768a20250e61fe262f3a018724.1631790362.git.isaikin-dmitry@xxxxxxxxx/

Thanks.

Julien Grall writes ("Re: [PATCH for-4.16] Revert "domctl: improve locking 
during domain destruction""):
> For instance, in the case of Amazon our setup was:
> 
> On a 144-core server with 4TiB of memory, destroying 32 guests (each
> with 4 vcpus and 122GiB memory) simultaneously takes:
> 
> before the revert: 29 minutes
> after the revert: 6 minutes

This is quite substantial!

> > Given the current point in the release, revert the commit and
> > reinstate holding the domctl lock during domain destruction. Further
> > work should be done in order to re-add more fine grained locking to
> > the domain destruction path once a proper solution to avoid the
> > heap_lock contention is found.
...
> > Since this is a revert and not new code I think the risk is lower.
> > There's however some risk, as the original commit was from 2017, and
> > hence the surrounding code has changed a bit. It's also a possibility
> > that some other parts of the domain destruction code now rely on this
> > more fine grained locking. Local tests however haven't shown issues.

I am finding this a difficult decision.  It appears from the threads
that a number of people have been running with this revert, which
would serve to mitigate the risk, but it's not clear to me what
version(s) of Xen they had applied it to.

Ultimately, though, I think I need to refer myself to the schedule I set:

    Friday 12th November                  Hard code freeze [*]

      Bugfixes for serious bugs (including regressions), and low-risk
      fixes only.
      (0.5 weeks)

I don't see any way that this change fits into this.  The point of a
freeze is, in part, that we stop trying to *improve* things and start
trying to *unbreak* them :-).

While the perf here is clearly poor I don't think it's actually
broken.

So, I'm afraid I'm saying "no".

Ian.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.