[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
Hi Jan, On 17/09/2021 14:47, Jan Beulich wrote: On 17.09.2021 11:41, Andrew Cooper wrote:On 17/09/2021 10:27, Julien Grall wrote:Hi, (+ some AWS folks) On 17/09/2021 11:17, Jan Beulich wrote:On 16.09.2021 19:52, Andrew Cooper wrote:On 16/09/2021 13:30, Jan Beulich wrote:On 16.09.2021 13:10, Dmitry Isaikin wrote:From: Dmitry Isaykin <isaikin-dmitry@xxxxxxxxx> This significantly speeds up concurrent destruction of multiple domains on x86.This effectively is a simplistic revert of 228ab9992ffb ("domctl: improve locking during domain destruction"). There it was found to actually improve things;Was it? I recall that it was simply an expectation that performance would be better...My recollection is that it was, for one of our customers.Amazon previously identified 228ab9992ffb as a massive perf hit, too.Interesting. I don't recall any mail to that effect.Here we go: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&reserved=0 We have been using the revert for quite a while in production and didn't notice any regression.Clearly some of the reasoning behind 228ab9992ffb was flawed and/or incomplete, and it appears as if it wasn't necessarily a wise move in hindsight.Possible; I continue to think though that the present observation wants properly understanding instead of more or less blindly undoing that change.To be honest, I think this is the other way around. You wrote and merged a patch with the following justification: " There is no need to hold the global domctl lock across domain_kill() - the domain lock is fully sufficient here, and parallel cleanup after multiple domains performs quite a bit better this way. " Clearly, the original commit message is lacking details on the exact setups and numbers. But we now have two stakeholders with proof that your patch is harmful to the setup you claim perform better with your patch. To me this is enough justification to revert the original patch. Anyone against the revert, should provide clear details of why the patch should not be reverted.I second a revert. I was concerned at the time that the claim was unsubstantiated, and now there is plenty of evidence to counter the claim.Well, I won't object to a proper revert. I still think we'd better get to the bottom of this, not the least because I thought there was agreement that mid to long term we should get rid of global locking wherever possible. Or are both of you saying that using a global lock here is obviously fine? And does either of you have at least a theory to explain the observation? I can only say that I find it puzzling. I will quote what Hongyan wrote back on the first report: " The best solution is to make the heap scalable instead of a global lock, but that is not going to be trivial. Of course, another solution is to keep the domctl lock dropped in domain_kill() but have another domain_kill lock so that competing domain_kill()s will try to take that lock and back off with hypercall continuation. But this is kind of hacky (we introduce a lock to reduce spinlock contention elsewhere), which is probably not a solution but a workaround. " Cheers, -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |