[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed

  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 17 Sep 2021 11:47:49 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=ax/VI5edREgAIO4of/Cl/ZPvcu5LUESwbiEQY1Fd8fk=; b=VHuF1eGfZBJbah6NlYMCDbAqFNFhy7LQtH9uEPznNkwklEeutoneXSglQX0UaHvLrWCK8dwK2dg160/q1cQ8DC2grCSs0o6FvUdbc5t7NYO2VZ5yl54Fsz5SwDPbp78RbRlEMcwYuYZd/sAeER4eVG14pxWUxxqox7Qa7iTI7BPfz3qhiJMZj4AtA+AFMBAlU9BbQ6Xs/hOWovn8441hIVfi6loaEvqfLiodDo/xi6pXu4js4pvk2pmLIx92QdZAVNmtHeTCc13RXL6ZtuRMPn74bCt+xHUa/aX8C1XrcgEeT72Et2yhVDbeipYzOXqK+rA7RwU05giasB+MjEudlQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L4ZgNOmH6jGjjN5cVTWNdfsGt3LoyAwYAQJXbCf5aKmDT6JnX6heOQA0v2uFGJ21mPFmnXSqczl76ZIFiKcpnQuVRGku+ZiQYsgqLiHdgViceFnpNschPiSzOdpWLFV3J496sbPjjVpECpl/jsbyHFOSocKAYukt7xNh2gj0YFdZfrOjkxBzSjvXoSEFn5gjECMRZ4KHtxsyrOh2rpZsvNFwCwxdQzc0f3Lwqinmeww53krM626/n+8jLe/ndH1d26+C3EVWwacAQBn/Gst360ci1enjaxYTQ+/4T9EHKc3K6swi2VGDLnkWYDk7bR6aZmaY8BO74LIVaZmMeVJOdQ==
  • Authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=suse.com;
  • Cc: George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Dmitry Isaikin <isaikin-dmitry@xxxxxxxxx>, rjstone@xxxxxxxxxxxx, raphning@xxxxxxxxxxxx, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Fri, 17 Sep 2021 09:48:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.09.2021 11:41, Andrew Cooper wrote:
> On 17/09/2021 10:27, Julien Grall wrote:
>> Hi,
>> (+ some AWS folks)
>> On 17/09/2021 11:17, Jan Beulich wrote:
>>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>>> From: Dmitry Isaykin <isaikin-dmitry@xxxxxxxxx>
>>>>>> This significantly speeds up concurrent destruction of multiple
>>>>>> domains on x86.
>>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>>> improve locking during domain destruction"). There it was found to
>>>>> actually improve things;
>>>> Was it?  I recall that it was simply an expectation that performance
>>>> would be better...
>>> My recollection is that it was, for one of our customers.
>>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>> Interesting. I don't recall any mail to that effect.
>> Here we go:
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
>> We have been using the revert for quite a while in production and didn't
>> notice any regression.
>>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>>> hindsight.
>>> Possible; I continue to think though that the present observation wants
>>> properly understanding instead of more or less blindly undoing that
>>> change.
>> To be honest, I think this is the other way around. You wrote and merged
>> a patch with the following justification:
>> "
>>     There is no need to hold the global domctl lock across domain_kill() -
>>     the domain lock is fully sufficient here, and parallel cleanup after
>>     multiple domains performs quite a bit better this way.
>> "
>> Clearly, the original commit message is lacking details on the exact
>> setups and numbers. But we now have two stakeholders with proof that
>> your patch is harmful to the setup you claim perform better with your
>> patch.
>> To me this is enough justification to revert the original patch. Anyone
>> against the revert, should provide clear details of why the patch should
>> not be reverted.
> I second a revert.
> I was concerned at the time that the claim was unsubstantiated, and now
> there is plenty of evidence to counter the claim.

Well, I won't object to a proper revert. I still think we'd better get to
the bottom of this, not the least because I thought there was agreement
that mid to long term we should get rid of global locking wherever
possible. Or are both of you saying that using a global lock here is
obviously fine? And does either of you have at least a theory to explain
the observation? I can only say that I find it puzzling.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.