|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [xen-unstable test] 164996: regressions - FAIL
On 23.09.2021 03:10, Stefano Stabellini wrote:
> On Wed, 22 Sep 2021, Jan Beulich wrote:
>> On 22.09.2021 01:38, Stefano Stabellini wrote:
>>> On Mon, 20 Sep 2021, Ian Jackson wrote:
>>>> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
>>>>> As per
>>>>>
>>>>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
>>>>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639
>>>>> inactive_anon:15857 isolated_anon:0
>>>>> Sep 15 14:44:55.514480 [ 1613.324918] active_file:13286
>>>>> inactive_file:11182 isolated_file:0
>>>>> Sep 15 14:44:55.514545 [ 1613.324918] unevictable:0 dirty:30 writeback:0
>>>>> unstable:0
>>>>> Sep 15 14:44:55.526477 [ 1613.324918] slab_reclaimable:10922
>>>>> slab_unreclaimable:30234
>>>>> Sep 15 14:44:55.526540 [ 1613.324918] mapped:11277 shmem:10975
>>>>> pagetables:401 bounce:0
>>>>> Sep 15 14:44:55.538474 [ 1613.324918] free:8364 free_pcp:100
>>>>> free_cma:1650
>>>>>
>>>>> the system doesn't look to really be out of memory; as per
>>>>>
>>>>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB
>>>>> (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C)
>>>>> 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
>>>>>
>>>>> there even look to be a number of higher order pages available (albeit
>>>>> without digging I can't tell what "(C)" means). Nevertheless order-4
>>>>> allocations aren't really nice.
>>>>
>>>> The host history suggests this may possibly be related to a qemu update.
>>>>
>>>> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
>>
>> Stefano - as per some of your investigation detailed further down I
>> wonder whether you had seen this part of Ian's reply. (Question of
>> course then is how that qemu update had managed to get pushed.)
>>
>>>> The grub cfg has this:
>>>>
>>>> multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all
>>>> console=dtuart dom0_mem=512M,max:512M ucode=scan ${xen_rm_opts}
>>>>
>>>> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
>>>
>>> I definitely recommend to increase dom0 memory, especially as I guess
>>> the box is going to have a significant amount, far more than 4GB. I
>>> would set it to 2GB. Also the syntax on ARM is simpler, so it should be
>>> just: dom0_mem=2G
>>
>> Ian - I guess that's an adjustment relatively easy to make? I wonder
>> though whether we wouldn't want to address the underlying issue first.
>> Presumably not, because the fix would likely take quite some time to
>> propagate suitably. Yet if not, we will want to have some way of
>> verifying that an eventual fix there would have helped here.
>>
>>> In addition, I also did some investigation just in case there is
>>> actually a bug in the code and it is not a simple OOM problem.
>>
>> I think the actual issue is quite clear; what I'm struggling with is
>> why we weren't hit by it earlier.
>>
>> As imo always, non-order-0 allocations (perhaps excluding the bringing
>> up of the kernel or whichever entity) are to be avoided it at possible.
>> The offender in this case looks to be privcmd's alloc_empty_pages().
>> For it to request through kcalloc() what ends up being an order-4
>> allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty
>> large chunk of guest memory to get mapped. Which may in turn be
>> questionable, but I'm afraid I don't have the time to try to drill
>> down where that request is coming from and whether that also wouldn't
>> better be split up.
>>
>> The solution looks simple enough - convert from kcalloc() to kvcalloc().
>> I can certainly spin up a patch to Linux to this effect. Yet that still
>> won't answer the question of why this issue has popped up all of the
>> sudden (and hence whether there are things wanting changing elsewhere
>> as well).
>
> Also, I saw your patches for Linux. Let's say that the patches are
> reviewed and enqueued immediately to be sent to Linus at the next
> opportunity. It is going to take a while for them to take effect in
> OSSTest, unless we import them somehow in the Linux tree used by OSSTest
> straight away, right?
Yes.
> Should we arrange for one test OSSTest flight now with the patches
> applied to see if they actually fix the issue? Otherwise we might end up
> waiting for nothing...
Not sure how easy it is to do one-off Linux builds then to be used in
hypervisor tests. Ian?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |