[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"



Hi Roger,

On 09/11/2021 14:31, Roger Pau Monne wrote:
This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Performance analysis has shown that dropping the domctl lock during
domain destruction greatly increases the contention in the heap_lock,
thus making parallel destruction of domains slower.

The following lockperf data shows the difference between the current
code and the reverted one:

lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
lock:  2788704(0.362311723s), block:   222681( 0.091152276s)

Thanks for the numbers, this is already an improvement from the reverted.

Can you also please provide some details on the setup that was used to get the number? (e.g. how many guests, amount of memory...).

For instance, in the case of Amazon our setup was:

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes


Given the current point in the release, revert the commit and
reinstate holding the domctl lock during domain destruction. Further
work should be done in order to re-add more fine grained locking to
the domain destruction path once a proper solution to avoid the
heap_lock contention is found.

Reported-by: Hongyan Xia <hongyxia@xxxxxxxxxx>
Reported-by: Dmitry Isaikin <isaikin-dmitry@xxxxxxxxx>
Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
Cc: Ian Jackson <iwj@xxxxxxxxxxxxxx>

Since this is a revert and not new code I think the risk is lower.
There's however some risk, as the original commit was from 2017, and
hence the surrounding code has changed a bit. It's also a possibility
that some other parts of the domain destruction code now rely on this
more fine grained locking. Local tests however haven't shown issues.
---
  xen/common/domain.c | 12 ++----------
  xen/common/domctl.c |  5 +----
  2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 56d47dd664..093bb4403f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -917,21 +917,13 @@ int domain_kill(struct domain *d)
      if ( d == current->domain )
          return -EINVAL;
- /* Protected by d->domain_lock. */
+    /* Protected by domctl_lock. */
      switch ( d->is_dying )
      {
      case DOMDYING_alive:
-        domain_unlock(d);
          domain_pause(d);
-        domain_lock(d);
-        /*
-         * With the domain lock dropped, d->is_dying may have changed. Call
-         * ourselves recursively if so, which is safe as then we won't come
-         * back here.
-         */
-        if ( d->is_dying != DOMDYING_alive )
-            return domain_kill(d);
          d->is_dying = DOMDYING_dying;
+        spin_barrier(&d->domain_lock);
          argo_destroy(d);
          vnuma_destroy(d->vnuma);
          domain_set_outstanding_pages(d, 0);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 271862ae58..879a2adcbe 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -497,14 +497,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
          break;
case XEN_DOMCTL_destroydomain:
-        domctl_lock_release();
-        domain_lock(d);
          ret = domain_kill(d);
-        domain_unlock(d);
          if ( ret == -ERESTART )
              ret = hypercall_create_continuation(
                  __HYPERVISOR_domctl, "h", u_domctl);
-        goto domctl_out_unlock_domonly;
+        break;
case XEN_DOMCTL_setnodeaffinity:
      {


--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.