[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [xen-unstable test] 164996: regressions - FAIL
On Mon, 20 Sep 2021, Ian Jackson wrote: > Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"): > > As per > > > > Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info: > > Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 > > isolated_anon:0 > > Sep 15 14:44:55.514480 [ 1613.324918] active_file:13286 > > inactive_file:11182 isolated_file:0 > > Sep 15 14:44:55.514545 [ 1613.324918] unevictable:0 dirty:30 writeback:0 > > unstable:0 > > Sep 15 14:44:55.526477 [ 1613.324918] slab_reclaimable:10922 > > slab_unreclaimable:30234 > > Sep 15 14:44:55.526540 [ 1613.324918] mapped:11277 shmem:10975 > > pagetables:401 bounce:0 > > Sep 15 14:44:55.538474 [ 1613.324918] free:8364 free_pcp:100 free_cma:1650 > > > > the system doesn't look to really be out of memory; as per > > > > Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) > > 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB > > (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB > > > > there even look to be a number of higher order pages available (albeit > > without digging I can't tell what "(C)" means). Nevertheless order-4 > > allocations aren't really nice. > > The host history suggests this may possibly be related to a qemu update. > > http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html > > > What I can't see is why this may have started triggering recently. Was > > the kernel updated in osstest? Is 512Mb of memory perhaps a bit too > > small for a Dom0 on this system (with 96 CPUs)? Going through the log > > I haven't been able to find crucial information like how much memory > > the host has or what the hypervisor command line was. > > Logs from last host examination, including a dmesg: > > http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.examine/ > > Re the command line, does Xen not print it ? > > The bootloader output seems garbled in the serial log. > > Anyway, I think Xen is being booted EFI judging by the grub cfg: > > http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--grub.cfg.1 > > which means that it is probaly reading this: > > http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--xen.cfg > > which gives this specification of the command line: > > options=placeholder conswitch=x watchdog noreboot async-show-all > console=dtuart dom0_mem=512M,max:512M ucode=scan > > The grub cfg has this: > > multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all > console=dtuart dom0_mem=512M,max:512M ucode=scan ${xen_rm_opts} > > It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off". I definitely recommend to increase dom0 memory, especially as I guess the box is going to have a significant amount, far more than 4GB. I would set it to 2GB. Also the syntax on ARM is simpler, so it should be just: dom0_mem=2G In addition, I also did some investigation just in case there is actually a bug in the code and it is not a simple OOM problem. Looking at the recent OSSTests results, the first failure is: https://marc.info/?l=xen-devel&m=163145323631047 http://logs.test-lab.xenproject.org/osstest/logs/164951/ Indeed, the failure is the same test-arm64-arm64-libvirt-raw which is still failing in more recent tests: http://logs.test-lab.xenproject.org/osstest/logs/164951/test-arm64-arm64-libvirt-raw/info.html But if we look at the commit id of flight 164951, it is 6d45368a0a89e01a3a01d156af61fea565db96cc "xsm: drop dubious xsm_op_t type" by Daniel P. Smith (CCed). It is interesting because: - it is *before* all the recent ARM patch series - it is only 4 commits after master The 4 commits are: 2021-09-10 16:12 Daniel P. Smith o xsm: drop dubious xsm_op_t type 2021-09-10 16:12 Daniel P. Smith o xsm: remove remnants of xsm_memtype hook 2021-09-10 16:12 Daniel P. Smith o xsm: remove the ability to disable flask 2021-09-10 16:12 Andrew Cooper o xen: Implement xen/alternative-call.h for use in common code Looking at them in details: - "xen: Implement xen/alternative-call.h for use in common code" shouldn' It shouldn't affect ARM at all - "xsm: remove the ability to disable flask" It would only affect the test case if libvirt directly or via libxl calls FLASK_DISABLE. - "xsm: remove remnants of xsm_memtype hook" Shouldn't have any effects - "xsm: drop dubious xsm_op_t type" It doesn't look like it should have any runtime effect, only build time So among these four, only "xsm: remove the ability to disable flask" seems to have the potential to break a libvirt guest start test. Even that, it is far fetched and the lack of an explicit XSM-related error message in the logs would really point in the direction of an OOM.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |