[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 14 Jan 2021 15:30:16 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ncNNet4xyHdOK7B8MXAMYE66qwAWGPRpf7xuUEtwNmI=; b=oKiUKsVKHzhAf/5CdGCX72qrDKzBuPb71A43a7DTXkG0TrD9+YQNFQDdLnie45qUrs61tYz61SUbGmlJZ627knrD7KOzGV7kXuaH/8C1lCw7qFcjh/58fjnbhOTzmR4hh5S6dTvtbW7sHPwiJ7tp4Nj/IVG9oT5o7VI9W37jopBXsGtak9UDvH6NMvji4A4ydhivOW+G4YzintaI7K12bbT9wOt3yR9C+9OYcKLCIWzX/+FTmd1LhEjU4HmPMJHUrWfP+ZCRikXimbLEKbXjAMO1p+p+W+Zb53mPE7rVxb1QQ7nbcb3L/RXAi8smtRwOGRYWMVZD9Eeuu46qhZV1JQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J+KRl8MxHVHM+Z/HgVtAEFCT3H1YmDW6t70v9mqEnjQJT/MQSiALQ6dU4xP6MfyYOEI0wghxJZ8OnYc2xDFOYmpCxtviYy3A2JMPW4BsOBEZ0qbPlVGA0mv6xad7GuAfETtRo6qZ6CcqQUvkq4T+snk9p7zc3bXar5kaZ7IobET6gcy/JJn48pV/P9MjoKGlHQyJesNh3cuunuVHarsx0I2pEGVqCfNkDQzYLbhqSQpxJHJ06CCQb3k5oTKos4p+inlhwZTBLoti0c50D/R35imUDXLM9w3fDX4/THWiCwuPVS5GhBSqOtcGsyrqKURT7mCYAwBaTUsP5tgz7sVxtQ==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 14 Jan 2021 15:30:34 +0000
  • Ironport-sdr: GRO0lCI4QVeiOwkaj2fyg/fsVj+vzRsbWBnmB078hJCqe0V3aeZ1/MKQKfU5aN2Obcx1yPprPN VYVXLihJ9ktfGc1sILdLIX0hPVIeqi5+Yh8n+RnUBhJnl88HYl/Kp5X5dZzhAmo1Pvcc8Ersq3 CpZ4/iL547uT+EkFgdGfpb3l+TiNnui5gB5U6o4N019hL5AzevKirUjgmlnIkP1L3sNEtKM0UJ 4+ZCYgZzowfROXYVKS8nrYeVs7dcb2SpPnra+uSr4R9YOj3TPme2QD8no+9+3VE9iUCfEQd6lJ V2Y=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14/01/2021 10:14, Jan Beulich wrote:
> On 14.01.2021 00:16, Andrew Cooper wrote:
>> On 05/01/2021 15:56, Jan Beulich wrote:
>>> On 23.12.2020 17:34, Andrew Cooper wrote:
>>>> RFC:
>>>>  * This probably wants to be less fatal in release builds
>>> I'm not even convinced this wants to be a panic() in debug builds.
>> Any memory leak spotted by this is an XSA, except in the narrow case of
>> being due exclusively to a weird and non-default order of hypercalls.
>>
>> It absolutely needs to be something fatal in debug builds, for avoid
>> going unnoticed by testing.
> This is a perfectly valid position to take, but I'm afraid once
> again isn't the only possible one. Since I do routinely look at
> logs, I'm personally in favor of avoiding crashing the host
> even for debug builds. The more that I've had pretty bad fallout
> from crashes in the past, due to (afaict) bugs in a file system
> driver in Linux that persisted over a longer period of time.

By that logic, we should replace most assertions with printk()s.  The
only reason I didn't use assert in this patch is because the
backtrace/etc is totally useless (we're in an RCU callback).

No-one, not even you, reviews all the logs from OSSTest.  It's sometimes
hard enough to get anyone to look at the logs even when there are emails
saying "$X looks broken".

The point at which we can spot the resource leak is too late to leave a
zombie domain around (we've already removed the domain from the
domlist), and a printk() is not an acceptably robust way of signalling
the problem.  Any change in the wording will render detection useless,
and some testing scenarios (stress in particular) will manage to wrap
the console ring in short order which risks hiding all traces of the
problem.

We're talking about something which will only occur on staging for
ill-reviewed patches, along with a (hopefully) reliable test developers
can use themselves, *and* in due course, this test being run pre-push
when we sort out the Gitlab CI.

The bottom line is that making this fatal is the only way we have of
getting people to pay attention to the severity of the issue, and I
don't think it reasonable to hamper automated testing's ability to spot
these issues, simply because you believe you're better than average at
reading your log files.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.