[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
On 3/25/20 1:11 AM, Jan Beulich wrote: On 24.03.2020 19:39, Julien Grall wrote:On 24/03/2020 16:13, Jan Beulich wrote:On 24.03.2020 16:21, Hongyan Xia wrote:From: Hongyan Xia <hongyxia@xxxxxxxxxx> In contrast, after dropping that commit, parallel domain destructions will just fail to take the domctl lock, creating a hypercall continuation and backing off immediately, allowing the thread that holds the lock to destroy a domain much more quickly and allowing backed-off threads to process events and irqs. On a 144-core server with 4TiB of memory, destroying 32 guests (each with 4 vcpus and 122GiB memory) simultaneously takes: before the revert: 29 minutes after the revert: 6 minutesThis wants comparing against numbers demonstrating the bad effects of the global domctl lock. Iirc they were quite a bit higher than 6 min, perhaps depending on guest properties.Your original commit message doesn't contain any clue in which cases the domctl lock was an issue. So please provide information on the setups you think it will make it worse.I did never observe the issue myself - let's see whether one of the SUSE people possibly involved in this back then recall (or have further pointers; Jim, Charles?), or whether any of the (partly former) Citrix folks do. My vague recollection is that the issue was the tool stack as a whole stalling for far too long in particular when destroying very large guests. I too only have a vague memory of the issue but do recall shutting down large guests (e.g. 500GB) taking a long time and blocking other toolstack operations. I haven't checked on the behavior in quite some time though. One important aspect not discussed in the commit message at all is that holding the domctl lock block basically _all_ tool stack operations (including e.g. creation of new guests), whereas the new issue attempted to be addressed is limited to just domain cleanup. I more vaguely recall shutting down the host taking a *long* time when dom0 had large amounts of memory, e.g. when it had all host memory (no dom0_mem= setting and autoballooning enabled). Regards, Jim
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |