Xen project Mailing List

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations

To: Elliott Mitchell <ehem+xen@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 27 Jan 2021 21:03:32 +0000

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BXPkibSwvdFbVXNUaBojr3iXThvOZST2QqvnvfDLlPE=; b=C0anVPlv+GuWCjTpzenIBkXD/yOaDciPQDw9Ta5NMl02Ua38PXH9pvvi5vlc0kjhkqIlJNvAmjBZEqu+3sY8rIC510j5r8KxJOYknLOAV1q9PR4DEiAmcJyNbNsCo9nHbeaVBaA3+MZyJBWdYhMk4IUw/eGgsXqZNrZuQNwykHkjdJlC/4PCD4n4iEsTD/s1bD1jA2+JRLtfHyS06186hdHy9JkQ5X/cdAMoqxyHt/rK7G2soWZOGlaSiFlDQ6+AIuUywgCaW7P6mIjitBH2qL5vrH/GXqXqrDhfCxYhDgBAz+hFxQofsTg4YhN2yXzhgstxZd+/Gon6kiAWs4iggw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fYo9yflS3HuxDBMc+SVvSMAnamRSe1SfPb3HSPEQ+hEomgaXVPCMGXUAwWujOGUc2wlJNqIdY/F3Fp3/oo6aGIsRgEUC2uVZ5Km4YVlLxrSbJ2f+PqipF8G7bRJLXiVuYV2/UvRK8gPFkIh+RSk3xzwQToJ9wyviR5Zyt4ELOoK7ySo2O2eBE1cI9xbLp6WNG0nmyPIMEsM1SrAgeSvvtPHGtK08nyiMkxUFaNCtDGqV3tPdWcO2G3Qs7b4fOvst0x5Ii7njS/D+ThKUTiuyEwmevsRqqHM4OkUg5iSqt/1HQ0Q7L6CKrGY5rGaKQeXW9XCOUiVyyf5doqDywaZjkQ==

Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Cc: George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monn?? <roger.pau@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 27 Jan 2021 21:03:53 +0000

Ironport-sdr: fThbi1ueMItbPmtwREbZU61VP5ZXe6wFg0fhVhpakZxpL16Zr3SEgGZcGSH3uUFWK63hoQ/0cz +GhnWeGV7iZz0knI3Mqj7weVQZO4mznrJMdts/geUBGW92lvcaAjMF+v6T5QHiIweRDe1z3Ti6 pDgaMw8KD9Z/41VeRC87vK7fRP//sG37u789xPGgVjtzX4vTah7R6X8mTBCMF75UEN/dFtW2+D bGVa/3Uyn86uQIvo6uq5lUAId2wCREAjQaujOVh7bWZB8WA8CXNTjiu1K5RM2aUuL34kFTziaV 7II=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 27/01/2021 20:12, Elliott Mitchell wrote: > On Wed, Jan 27, 2021 at 10:47:19AM +0100, Jan Beulich wrote: >> On 26.01.2021 18:51, Elliott Mitchell wrote: >>> Okay, this has been reliably reproducing for a while. I had originally >>> thought it was a problem of HVM plus memory != maxmem, but the >>> non-immediate restart disagrees with that assessment. >> I guess it's not really clear what you mean with this, but anyway: >> The important aspect here that I'm concerned about is what the >> manifestations of the issue are. I'm still hoping that you would >> provide such information, so we can then start thinking about how >> to solve these. If, of course, there is anything worse than the >> expected effects which use of PoD can have on the guest itself. > Manifestation is domain 0 and/or Xen panic a few seconds after the > domain.cfg file is loaded via `xl`. Everything on the host is lost and > the host restarts. Any VMs which were present are lost and need to > restart, similar to power loss without UPS. > > Upon pressing return for `xl create domain.cfg` there is a short period > of apparently normal behavior in domain 0. After this there is a short > period of very laggy behavior in domain 0. Finally domain 0 goes > unresponsive and so far by the time I've gotten to the host's console it > has already started to reboot. > > The periods of apparently normal and laggy behavior are perhaps 5-10 > seconds each. > > The configurations I've reproduced with have had maxmem substantially > larger than the total host memory (this is intended as a prototype of a > future larger VM). The first recorded observation of this was with > Debian's build of Xen 4.8, though I recall running into it with Xen 4.4 > too. > > Part of the problem might also be attributeable to QEMU touching all > memory on start (thus causing PoD to try to populate *all* memory) or > OVMF. So. What *should* happen is that if QEMU/OVMF dirties more memory than exists in the PoD cache, the domain gets terminated. Irrespective, Xen/dom0 dying isn't an expected consequence of any normal action like this. Do you have a serial log of the crash? If not, can you set up a crash kernel environment to capture the logs, or alternatively reproduce the issue on a different box which does have serial? Whatever the underlying bug is, avoiding 2M degrading to 4K allocations isn't a real fix, and is at best, sidestepping the problem. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.