[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable test] 164996: regressions - FAIL


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 23 Sep 2021 11:24:14 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=a1fFr37HGDC+ZBiUGHJDEvBCp02MANs3D58TdZg3h5A=; b=Vt5S+4LDSC8blQ2Nij4cn1dwx6+9iGecjgsoLnnEY49tD/2JiENqv1HV3wTa3ZrwUv9gI6Xti+ff8ZtwfpDNoJeF1P1MLUjV69icwZ/Bv/Dlr0xy7lRZ7U00xT2dzEMR9OA/lpDDmzEbZEzp7wt12Cp3mIJ3fyDXAR0EFOMILVzQ8+HO+GZMdfuxWvY8okKr2s5JDAfliitSCd/mwzFMMAJj4V0cEQu2PdPOrOPYQubrunFokggifl2JidN9emP7Ob2oWZOyB2ozNo7BBv27/g1CYbdRKuMfya1VZneNFgvxLngMONAbs6HE2mxvd6hhkfsLglDT/sWSoRb6hntKfA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=McxchkuwyM6C9fw+16jxCNjI+2EyTT5cJN1HnH1+ztlxpE/DfVkPlveHl8FzMl69x1ZfAwoVsmvAHhw60WgF3Rf5iAh91qzP3QEzBKrdxveVnIvM3BWFogqSf+okYKq3mnDgn+u2j+DgziNq6VRPFE46EyUoth0nPZ0TBpCH9ShU8q4oLNNI4cBqE/J/EFlTZcOchXQY96r7w4HKaKjCydYn2SZipYo2Cn94Kn4YFJcOMRNC/FgC5nLLij731dN4V8E8bguLTfrDJs6M1Uz8MVCp1e6K9BYBr55LcVPlbE2QttAukl8xLuZ9FN4LpmUvD8WbyUuV54VjNfa1CvD02w==
  • Authentication-results: apertussolutions.com; dkim=none (message not signed) header.d=none;apertussolutions.com; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, dpsmith@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 23 Sep 2021 09:24:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 23.09.2021 03:10, Stefano Stabellini wrote:
> On Wed, 22 Sep 2021, Jan Beulich wrote:
>> On 22.09.2021 01:38, Stefano Stabellini wrote:
>>> On Mon, 20 Sep 2021, Ian Jackson wrote:
>>>> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
>>>>> As per
>>>>>
>>>>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
>>>>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 
>>>>> inactive_anon:15857 isolated_anon:0
>>>>> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 
>>>>> inactive_file:11182 isolated_file:0
>>>>> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 
>>>>> unstable:0
>>>>> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 
>>>>> slab_unreclaimable:30234
>>>>> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 
>>>>> pagetables:401 bounce:0
>>>>> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 
>>>>> free_cma:1650
>>>>>
>>>>> the system doesn't look to really be out of memory; as per
>>>>>
>>>>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB 
>>>>> (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 
>>>>> 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
>>>>>
>>>>> there even look to be a number of higher order pages available (albeit
>>>>> without digging I can't tell what "(C)" means). Nevertheless order-4
>>>>> allocations aren't really nice.
>>>>
>>>> The host history suggests this may possibly be related to a qemu update.
>>>>
>>>> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
>>
>> Stefano - as per some of your investigation detailed further down I
>> wonder whether you had seen this part of Ian's reply. (Question of
>> course then is how that qemu update had managed to get pushed.)
>>
>>>> The grub cfg has this:
>>>>
>>>>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all 
>>>> console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
>>>>
>>>> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
>>>
>>> I definitely recommend to increase dom0 memory, especially as I guess
>>> the box is going to have a significant amount, far more than 4GB. I
>>> would set it to 2GB. Also the syntax on ARM is simpler, so it should be
>>> just: dom0_mem=2G
>>
>> Ian - I guess that's an adjustment relatively easy to make? I wonder
>> though whether we wouldn't want to address the underlying issue first.
>> Presumably not, because the fix would likely take quite some time to
>> propagate suitably. Yet if not, we will want to have some way of
>> verifying that an eventual fix there would have helped here.
>>
>>> In addition, I also did some investigation just in case there is
>>> actually a bug in the code and it is not a simple OOM problem.
>>
>> I think the actual issue is quite clear; what I'm struggling with is
>> why we weren't hit by it earlier.
>>
>> As imo always, non-order-0 allocations (perhaps excluding the bringing
>> up of the kernel or whichever entity) are to be avoided it at possible.
>> The offender in this case looks to be privcmd's alloc_empty_pages().
>> For it to request through kcalloc() what ends up being an order-4
>> allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty
>> large chunk of guest memory to get mapped. Which may in turn be
>> questionable, but I'm afraid I don't have the time to try to drill
>> down where that request is coming from and whether that also wouldn't
>> better be split up.
>>
>> The solution looks simple enough - convert from kcalloc() to kvcalloc().
>> I can certainly spin up a patch to Linux to this effect. Yet that still
>> won't answer the question of why this issue has popped up all of the
>> sudden (and hence whether there are things wanting changing elsewhere
>> as well).
> 
> Also, I saw your patches for Linux. Let's say that the patches are
> reviewed and enqueued immediately to be sent to Linus at the next
> opportunity. It is going to take a while for them to take effect in
> OSSTest, unless we import them somehow in the Linux tree used by OSSTest
> straight away, right?

Yes.

> Should we arrange for one test OSSTest flight now with the patches
> applied to see if they actually fix the issue? Otherwise we might end up
> waiting for nothing...

Not sure how easy it is to do one-off Linux builds then to be used in
hypervisor tests. Ian?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.