Xen project Mailing List

Re: [Xen-devel] bad page flags booting 32bit dom0 on 64bit hypervisor using dom0_mem (kernel >=4.2)

To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>

From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>

Date: Wed, 11 May 2016 10:08:58 +0200

Cc: Juergen Gross <jgross@xxxxxxxx>, Nathan Zimmer <nzimmer@xxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Mel Gorman <mgorman@xxxxxxx>

Delivery-date: Wed, 11 May 2016 08:09:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 02.05.2016 16:24, Stefan Bader wrote: > On 02.05.2016 13:41, Juergen Gross wrote: >> On 02/05/16 12:47, Stefan Bader wrote: >>> I recently tried to boot 32bit dom0 on 64bit Xen host which I configured to >>> run >>> with a limited, fix amount of memory for dom0. It seems that somewhere >>> between >>> kernel versions 3.19 and 4.2 (sorry that is still a wide range) the Linux >>> kernel >>> would report bad page flags for a range of pages (which seem to be around >>> the >>> end of the guest pfn range). For a 4.2 kernel that was easily missed as the >>> boot >>> finished ok and dom0 was accessible. However starting with 4.4 (tested 4.5 >>> and a >>> 4.6-rc) the serial console output freezes after some of those bad page flag >>> messages and then (unfortunately without any further helpful output) the >>> host >>> reboots (I assume there is a panic that triggers a reset). >>> >>> I suspect the problem is more a kernel side one. It is just possible to >>> influence things by variation of dom0_mem=#,max:#. 512M seems ok, 1024M, >>> 2048M, >>> and 3072M cause bad page flags starting around kernel 4.2 and reboots around >>> 4.4. Then 4096M and not clamping dom0 memory seem to be ok again (though not >>> limiting dom0 memory seems to cause trouble on 32bit dom0 later when a domU >>> tries to balloon memory, but I think that is a different problem). >>> >>> I have not seen this on a 64bit dom0. Below is an example of those bad page >>> errors. Somehow it looks to be a page marked as reserved. Initially I >>> wondered >>> whether this could be a problem of not clearing page flags when moving >>> mappings >>> to match the e820. But I never looked into i386 memory setup in that >>> detail. So >>> I am posting this, hoping that someone may have an idea from the detail >>> about >>> where to look next. PAE is enabled there. Usually its bpf init that gets >>> hit but >>> that likely is just because that is doing the first vmallocs. >> >> Could you please post the kernel config, Xen and dom0 boot parameters? >> I'm quite sure this is no common problem as there are standard tests >> running for each kernel version including 32 bit dom0 with limited >> memory size. > > Hi Jürgen, > > sure. Though by doing that I realized where I actually messed the whole thing > up. I got the max limit syntax completely wrong. :( Instead of the correct > "dom0_mem=1024M,max:1024M" I am using "dom0_mem=1024M:max=1024M" which I guess > is like not having max set at all. Not sure whether that is a valid use case. > > When I actually do the dom0_mem argument right, there are no bad page flag > errors even in 4.4 with 1024M limit. I was at least consistent in my > mis-configuration, so doing the same stupid thing on 64bit seems to be handled > more gracefully. > > Likely false alarm. But at least cut&pasting the config into mail made me spot > the problem... > Ok, thinking that "dom0_mem=x" (without a max or min) still is a valid case, I went ahead and did a bisect for when the bad page flag issue started. I ended up at: 92923ca "mm: meminit: only set page reserved in the memblock region" And with a few more printks in the new functions I finally realized why this goes wrong. The new reserve_bootmem_region is using unsigned long for start and end addresses which just isn't working too well for 32bit. For Xen dom0 the problem with that can just be more easily triggered. When dom0 memory is limited to a small size but allowed to balloon for more, the additional system memory is put into reserved regions. In my case a host with 8G memory and say 1G initial dom0 memory this created (apart from other) one reserved region which started at 4GB and covered the remaining 4G of host memory. Which reserve_bootmem_region() got as 0-4G due to the unsigned long conversion. This basically marked *all* memory below 4G as reserved. The fix is relatively simple, just use phys_addr_t for start and end. I tested this on 4.2 and 4.4 kernels. Both now boot without errors and neither does the 4.4 kernel crash. Maybe still not 100% safe when running on very large memory systems (if I did not get the math wrong 16T) but at least some improvement... -Stefan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.