[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable test] 164996: regressions - FAIL



Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> On 22.09.2021 01:38, Stefano Stabellini wrote:
> > On Mon, 20 Sep 2021, Ian Jackson wrote:
> >>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB 
> >>> (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 
> >>> 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> >>>
> >>> there even look to be a number of higher order pages available (albeit
> >>> without digging I can't tell what "(C)" means). Nevertheless order-4
> >>> allocations aren't really nice.
> >>
> >> The host history suggests this may possibly be related to a qemu update.
> >>
> >> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> 
> Stefano - as per some of your investigation detailed further down I
> wonder whether you had seen this part of Ian's reply. (Question of
> course then is how that qemu update had managed to get pushed.)

I looked for bisection results for this failure and

  
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-arm64-arm64-libvirt-xsm.guest-start--debian.repeat.html

it's a heisenbug.  Also, the tests got reorganised slightly as a
side-effect of dropping some i386 tests, so some of these tests are
"new" from osstest's pov, although their content isn't really new.

Unfortunately, with it being a heisenbug, we won't get any useful
bisection results, which would otherwise conclusively tell us which
tree the problem was in.

> >> The grub cfg has this:
> >>
> >>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all 
> >> console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
> >>
> >> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
> > 
> > I definitely recommend to increase dom0 memory, especially as I guess
> > the box is going to have a significant amount, far more than 4GB. I
> > would set it to 2GB. Also the syntax on ARM is simpler, so it should be
> > just: dom0_mem=2G
> 
> Ian - I guess that's an adjustment relatively easy to make? I wonder
> though whether we wouldn't want to address the underlying issue first.
> Presumably not, because the fix would likely take quite some time to
> propagate suitably. Yet if not, we will want to have some way of
> verifying that an eventual fix there would have helped here.

It could propagate fairly quickly.  But I'm loathe to make this change
because it seems to me that it would be simply masking the bug.

Notably, when this goes wrong, it seems to happen after the guest has
been started once successfully already.  So there *is* enough
memory...

Ian.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.