[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [xen-unstable test] 164996: regressions - FAIL
On 23.09.2021 03:10, Stefano Stabellini wrote: > On Wed, 22 Sep 2021, Jan Beulich wrote: >> On 22.09.2021 01:38, Stefano Stabellini wrote: >>> On Mon, 20 Sep 2021, Ian Jackson wrote: >>>> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"): >>>>> As per >>>>> >>>>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info: >>>>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 >>>>> inactive_anon:15857 isolated_anon:0 >>>>> Sep 15 14:44:55.514480 [ 1613.324918] active_file:13286 >>>>> inactive_file:11182 isolated_file:0 >>>>> Sep 15 14:44:55.514545 [ 1613.324918] unevictable:0 dirty:30 writeback:0 >>>>> unstable:0 >>>>> Sep 15 14:44:55.526477 [ 1613.324918] slab_reclaimable:10922 >>>>> slab_unreclaimable:30234 >>>>> Sep 15 14:44:55.526540 [ 1613.324918] mapped:11277 shmem:10975 >>>>> pagetables:401 bounce:0 >>>>> Sep 15 14:44:55.538474 [ 1613.324918] free:8364 free_pcp:100 >>>>> free_cma:1650 >>>>> >>>>> the system doesn't look to really be out of memory; as per >>>>> >>>>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB >>>>> (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) >>>>> 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB >>>>> >>>>> there even look to be a number of higher order pages available (albeit >>>>> without digging I can't tell what "(C)" means). Nevertheless order-4 >>>>> allocations aren't really nice. >>>> >>>> The host history suggests this may possibly be related to a qemu update. >>>> >>>> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html >> >> Stefano - as per some of your investigation detailed further down I >> wonder whether you had seen this part of Ian's reply. (Question of >> course then is how that qemu update had managed to get pushed.) >> >>>> The grub cfg has this: >>>> >>>> multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all >>>> console=dtuart dom0_mem=512M,max:512M ucode=scan ${xen_rm_opts} >>>> >>>> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off". >>> >>> I definitely recommend to increase dom0 memory, especially as I guess >>> the box is going to have a significant amount, far more than 4GB. I >>> would set it to 2GB. Also the syntax on ARM is simpler, so it should be >>> just: dom0_mem=2G >> >> Ian - I guess that's an adjustment relatively easy to make? I wonder >> though whether we wouldn't want to address the underlying issue first. >> Presumably not, because the fix would likely take quite some time to >> propagate suitably. Yet if not, we will want to have some way of >> verifying that an eventual fix there would have helped here. >> >>> In addition, I also did some investigation just in case there is >>> actually a bug in the code and it is not a simple OOM problem. >> >> I think the actual issue is quite clear; what I'm struggling with is >> why we weren't hit by it earlier. >> >> As imo always, non-order-0 allocations (perhaps excluding the bringing >> up of the kernel or whichever entity) are to be avoided it at possible. >> The offender in this case looks to be privcmd's alloc_empty_pages(). >> For it to request through kcalloc() what ends up being an order-4 >> allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty >> large chunk of guest memory to get mapped. Which may in turn be >> questionable, but I'm afraid I don't have the time to try to drill >> down where that request is coming from and whether that also wouldn't >> better be split up. >> >> The solution looks simple enough - convert from kcalloc() to kvcalloc(). >> I can certainly spin up a patch to Linux to this effect. Yet that still >> won't answer the question of why this issue has popped up all of the >> sudden (and hence whether there are things wanting changing elsewhere >> as well). > > Also, I saw your patches for Linux. Let's say that the patches are > reviewed and enqueued immediately to be sent to Linus at the next > opportunity. It is going to take a while for them to take effect in > OSSTest, unless we import them somehow in the Linux tree used by OSSTest > straight away, right? Yes. > Should we arrange for one test OSSTest flight now with the patches > applied to see if they actually fix the issue? Otherwise we might end up > waiting for nothing... Not sure how easy it is to do one-off Linux builds then to be used in hypervisor tests. Ian? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |