[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable test] 164996: regressions - FAIL



On Mon, 20 Sep 2021, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> > As per
> > 
> > Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
> > Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 
> > isolated_anon:0
> > Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 
> > inactive_file:11182 isolated_file:0
> > Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 
> > unstable:0
> > Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 
> > slab_unreclaimable:30234
> > Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 
> > pagetables:401 bounce:0
> > Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
> > 
> > the system doesn't look to really be out of memory; as per
> > 
> > Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 
> > 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB 
> > (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> > 
> > there even look to be a number of higher order pages available (albeit
> > without digging I can't tell what "(C)" means). Nevertheless order-4
> > allocations aren't really nice.
> 
> The host history suggests this may possibly be related to a qemu update.
> 
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> 
> > What I can't see is why this may have started triggering recently. Was
> > the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
> > small for a Dom0 on this system (with 96 CPUs)? Going through the log
> > I haven't been able to find crucial information like how much memory
> > the host has or what the hypervisor command line was.
> 
> Logs from last host examination, including a dmesg:
> 
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.examine/
> 
> Re the command line, does Xen not print it ?
> 
> The bootloader output seems garbled in the serial log.
> 
> Anyway, I think Xen is being booted EFI judging by the grub cfg:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--grub.cfg.1
> 
> which means that it is probaly reading this:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--xen.cfg
> 
> which gives this specification of the command line:
> 
>   options=placeholder conswitch=x watchdog noreboot async-show-all 
> console=dtuart dom0_mem=512M,max:512M ucode=scan  
> 
> The grub cfg has this:
> 
>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all 
> console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
> 
> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".

I definitely recommend to increase dom0 memory, especially as I guess
the box is going to have a significant amount, far more than 4GB. I
would set it to 2GB. Also the syntax on ARM is simpler, so it should be
just: dom0_mem=2G

In addition, I also did some investigation just in case there is
actually a bug in the code and it is not a simple OOM problem.

Looking at the recent OSSTests results, the first failure is:
https://marc.info/?l=xen-devel&m=163145323631047
http://logs.test-lab.xenproject.org/osstest/logs/164951/

Indeed, the failure is the same test-arm64-arm64-libvirt-raw which is
still failing in more recent tests:
http://logs.test-lab.xenproject.org/osstest/logs/164951/test-arm64-arm64-libvirt-raw/info.html

But if we look at the commit id of flight 164951, it is
6d45368a0a89e01a3a01d156af61fea565db96cc "xsm: drop dubious xsm_op_t
type" by Daniel P. Smith (CCed).

It is interesting because:
- it is *before* all the recent ARM patch series
- it is only 4 commits after master


The 4 commits are:

2021-09-10 16:12 Daniel P. Smith   o xsm: drop dubious xsm_op_t type
2021-09-10 16:12 Daniel P. Smith   o xsm: remove remnants of xsm_memtype hook
2021-09-10 16:12 Daniel P. Smith   o xsm: remove the ability to disable flask
2021-09-10 16:12 Andrew Cooper     o xen: Implement xen/alternative-call.h for 
use in common code


Looking at them in details:

- "xen: Implement xen/alternative-call.h for use in common code" shouldn'
It shouldn't affect ARM at all

- "xsm: remove the ability to disable flask"
It would only affect the test case if libvirt directly or via libxl
calls FLASK_DISABLE.

- "xsm: remove remnants of xsm_memtype hook"
Shouldn't have any effects

- "xsm: drop dubious xsm_op_t type"
It doesn't look like it should have any runtime effect, only build time


So among these four, only "xsm: remove the ability to disable flask"
seems to have the potential to break a libvirt guest start test. Even
that, it is far fetched and the lack of an explicit XSM-related error
message in the logs would really point in the direction of an OOM.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.