[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Revert "domctl: improve locking during domain destruction"



Hi Jim,

On 26/03/2020 16:55, Jim Fehlig wrote:
On 3/25/20 1:11 AM, Jan Beulich wrote:
On 24.03.2020 19:39, Julien Grall wrote:
On 24/03/2020 16:13, Jan Beulich wrote:
On 24.03.2020 16:21, Hongyan Xia wrote:
From: Hongyan Xia <hongyxia@xxxxxxxxxx>
In contrast,
after dropping that commit, parallel domain destructions will just fail
to take the domctl lock, creating a hypercall continuation and backing
off immediately, allowing the thread that holds the lock to destroy a
domain much more quickly and allowing backed-off threads to process
events and irqs.

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes

This wants comparing against numbers demonstrating the bad effects of
the global domctl lock. Iirc they were quite a bit higher than 6 min,
perhaps depending on guest properties.

Your original commit message doesn't contain any clue in which
cases the domctl lock was an issue. So please provide information
on the setups you think it will make it worse.

I did never observe the issue myself - let's see whether one of the SUSE
people possibly involved in this back then recall (or have further
pointers; Jim, Charles?), or whether any of the (partly former) Citrix
folks do. My vague recollection is that the issue was the tool stack as
a whole stalling for far too long in particular when destroying very
large guests.

I too only have a vague memory of the issue but do recall shutting down large guests (e.g. 500GB) taking a long time and blocking other toolstack operations. I haven't checked on the behavior in quite some time though.

It might be worth checking how toolstack operations (such as domain creating) is affected by the revert. @Hongyan would you be able to test it?


One important aspect not discussed in the commit message
at all is that holding the domctl lock block basically _all_ tool stack
operations (including e.g. creation of new guests), whereas the new
issue attempted to be addressed is limited to just domain cleanup.

I more vaguely recall shutting down the host taking a *long* time when dom0 had large amounts of memory, e.g. when it had all host memory (no dom0_mem= setting and autoballooning enabled).

AFAIK, we never relinquish memory from dom0. So I am not sure how a large amount of memory in Dom0 would affect the host shutting down.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.