[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
On Wed, 2020-03-25 at 08:11 +0100, Jan Beulich wrote: > On 24.03.2020 19:39, Julien Grall wrote: > > On 24/03/2020 16:13, Jan Beulich wrote: > > > On 24.03.2020 16:21, Hongyan Xia wrote: > > > > From: Hongyan Xia <hongyxia@xxxxxxxxxx> > > > > In contrast, > > > > after dropping that commit, parallel domain destructions will > > > > just fail > > > > to take the domctl lock, creating a hypercall continuation and > > > > backing > > > > off immediately, allowing the thread that holds the lock to > > > > destroy a > > > > domain much more quickly and allowing backed-off threads to > > > > process > > > > events and irqs. > > > > > > > > On a 144-core server with 4TiB of memory, destroying 32 guests > > > > (each > > > > with 4 vcpus and 122GiB memory) simultaneously takes: > > > > > > > > before the revert: 29 minutes > > > > after the revert: 6 minutes > > > > > > This wants comparing against numbers demonstrating the bad > > > effects of > > > the global domctl lock. Iirc they were quite a bit higher than 6 > > > min, > > > perhaps depending on guest properties. > > > > Your original commit message doesn't contain any clue in which > > cases the domctl lock was an issue. So please provide information > > on the setups you think it will make it worse. > > I did never observe the issue myself - let's see whether one of the > SUSE > people possibly involved in this back then recall (or have further > pointers; Jim, Charles?), or whether any of the (partly former) > Citrix > folks do. My vague recollection is that the issue was the tool stack > as > a whole stalling for far too long in particular when destroying very > large guests. One important aspect not discussed in the commit > message > at all is that holding the domctl lock block basically _all_ tool > stack > operations (including e.g. creation of new guests), whereas the new > issue attempted to be addressed is limited to just domain cleanup. The best solution is to make the heap scalable instead of a global lock, but that is not going to be trivial. Of course, another solution is to keep the domctl lock dropped in domain_kill() but have another domain_kill lock so that competing domain_kill()s will try to take that lock and back off with hypercall continuation. But this is kind of hacky (we introduce a lock to reduce spinlock contention elsewhere), which is probably not a solution but a workaround. Seeing the dramatic increase from 6 to 29 minutes in concurrent guest destruction, I wonder if the benefit of that commit can outweigh this negative though. Hongyan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |