[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations

  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxx>
  • Date: Fri, 29 Jan 2021 10:56:48 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7m82AvifGteFckzMK0MKTeirHM3Gf2rO6Qcwv+aMMzE=; b=JBogGZ/j3UM/gjGrgX/Q9mgTU5tlFWsQhk5hjZQSu9DflEGybQfOn02WRd3xAaPyXjiPJuUHKQyYoRc8OVOrlINff4cMsSNqr6Jzy0qCqRWEySFH59ztwQ/p4ocS1RuIr8mbskAUJcMVL0wHfRksEY5vXZKzN94sx8jE9BLYyjfuawoBhGJB3Dt5v5F+m69SPBmG+EljNsXaGC8dI++GMvqZhrx9NDSBWSqC2MdLC1ifjHp/WzhZcLdPpky36Lhvw7Zq6S8vR525D4GxViDlPIOMFahZ8c+e6kAql/b8kit2DMrSCLaNLHwmbAheJT4F5LvAvq/shdfsA77O1a7xOA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aWxNVPYvkxYbpc2On9pM3tUwZzriev2JOX/O8KwiP76cMxDXqG0s4nLk+7Vh7N0JaQ8UA6mwsqqntqHuJTZzAGwuwBZ1OJCfLZUgxPRV5SCnCrDAZwwm0jsjQs/lvusEniJmuL/b04VnKw1GnT2xEEu0cBGpPDVNRpfCzKmiWRyja/zEVyUJCUFNhJfouQJchxJDX6xMWizwYGNsI5Ajt5XnRVmFgInGKbl/QjwKyrvxIIwtoHmNzqux7F4owT4IjpBGvSI7dbW5vfSljG2rsI3RWluRbSwfq+WdreSCkClgZejmEMOD6sICYTrZgf7zotJ17LR5wq7D4vtjo8Jo0g==
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, "open list:X86" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Delivery-date: Fri, 29 Jan 2021 10:57:01 +0000
  • Ironport-sdr: t9ZDY4dyr8AegtQuSCglpEr2ILTQBfNlemmi8JSwjiW1Uy9mevsh8hA8+BfEHfCMzZ+tmwv/mu bbkxESclLmCszGeK5HjG33wM8EV7oaCY0geL7uoFCFHynQSASopkzGcSM37AdWpyIAN4fMfWC7 uMDAxm5WiZ99q9sioQqKQHKEVgYB1OvSJcQRghsLHlkEzcjfUNcSr+5H+U4VVIvR8r9n13ecTr 3goKqRF/SmzPrzFjptY44zS+PyHVJvMf5hb9n2yL8YFF+wHmR9hahTSBfXvCRM7VzJTDmqMyzZ KjA=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-topic: [PATCH] x86/pod: Do not fragment PoD memory allocations

> On Jan 28, 2021, at 10:56 PM, George Dunlap <george.dunlap@xxxxxxxxxx> wrote:
>> On Jan 28, 2021, at 10:42 PM, George Dunlap <george.dunlap@xxxxxxxxxx> wrote:
>>> On Jan 28, 2021, at 6:26 PM, Elliott Mitchell <ehem+xen@xxxxxxx> wrote:
>>> On Thu, Jan 28, 2021 at 11:19:41AM +0100, Jan Beulich wrote:
>>>> On 27.01.2021 23:28, Elliott Mitchell wrote:
>>>>> On Wed, Jan 27, 2021 at 09:03:32PM +0000, Andrew Cooper wrote:
>>>>>> So.?? What *should* happen is that if QEMU/OVMF dirties more memory than
>>>>>> exists in the PoD cache, the domain gets terminated.
>>>>>> Irrespective, Xen/dom0 dying isn't an expected consequence of any normal
>>>>>> action like this.
>>>>>> Do you have a serial log of the crash??? If not, can you set up a crash
>>>>>> kernel environment to capture the logs, or alternatively reproduce the
>>>>>> issue on a different box which does have serial?
>>>>> No, I don't.  I'm setup to debug ARM failures, not x86 ones.
>>>> Then alternatively can you at least give conditions that need to
>>>> be met to observe the problem, for someone to repro and then
>>>> debug? (The less complex the better, of course.)
>>> Multiple prior messages have included statements of what I believed to be
>>> the minimal case to reproduce.  Presently I believe the minimal
>>> constraints are, maxmem >= host memory, memory < free Xen memory, type
>>> HVM.  A minimal kr45hme.cfg file:
>>> type = "hvm"
>>> memory = 1024
>>> maxmem = 1073741824
>>> I suspect maxmem > free Xen memory may be sufficient.  The instances I
>>> can be certain of have been maxmem = total host memory *7.
>> Can you include your Xen version and dom0 command-line?
>> For me, domain creation fails with an error like this:
>> root@immortal:/images# xl create c6-01.cfg maxmem=1073741824
>> Parsing config from c6-01.cfg
>> xc: error: panic: xc_dom_boot.c:120: xc_dom_boot_mem_init: can't allocate 
>> low memory for domain: Out of memory
>> libxl: error: libxl_dom.c:593:libxl__build_dom: xc_dom_boot_mem_init failed: 
>> Cannot allocate memory
>> libxl: error: libxl_create.c:1576:domcreate_rebuild_done: Domain 9:cannot 
>> (re-)build domain: -3
>> libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 
>> 9:Non-existant domain
>> libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 9:Unable 
>> to destroy guest
>> libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 9:Destruction of 
>> domain failed
>> This is on staging-4.14 from a month or two ago (i.e., what I happened to 
>> have on a random test  box), and `dom0_mem=1024M,max:1024M` in my 
>> command-line.  That rune will give dom0 only 1GiB of RAM, but also prevent 
>> it from auto-ballooning down to free up memory for the guest.
> Hmm, but with that line removed, I get this:
> root@immortal:/images# xl create c6-01.cfg maxmem=1073741824
> Parsing config from c6-01.cfg
> libxl: error: libxl_mem.c:279:libxl_set_memory_target: New target 0 for dom0 
> is below the minimum threshold
> failed to free memory for the domain
> Maybe the issue you’re probably facing is that “minimum threshold” safety 
> catch either isn’t triggering, or is set low enough that your dom0 is OOMing 
> trying to make enough memory for your VM?

Looks like LIBXL_MIN_DOM0_MEM is hard-coded to 128MiB, which is not going to be 
enough on a lot of systems.  At very least that should be something that can be 
set in a global config somewhere.  Ideally we’d have a more sophisticated way 
of calculating the minimum value that wouldn’t trip so easily.

Elliot, as a short-term fix, I suggest considering setting aside a fixed amount 
of memory for dom0, as recommended in 




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.