[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] page-alloc: fix initialization of cross-node regions


  • To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 26 Jul 2022 08:14:34 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dD0tPYF9vg1Ta58uv42EHd8UFXVBnbGT6bTL29YE+iM=; b=Mp4H8m3niXOBvPfNDxpVbdkbHU8nT8Amam7l1kf+35mt0GBugd7WaZgx2QWMlnt0p2i0dTE/ksbYuKo9apDGTaC9RukQ7Wp5qh5L0QtkWsbciIL9tYFYy+MhXvQRhVwtYehQfEpTnkdYSOTaYLSmRQReAp8B//Qm6DEflZ400zWpkG3TlZBRYzZY5015KMSan9bJ1TVKaddRTf8H+wAvUaBCAfNhBK/IDabrZeR9RkmQF4ESCQuauFuPgUgaJhREU6zvQnhS8y5VKuPTeH+Fn0arEqRDWFbvRfJ/QSPtKq5pJQtsBvnWzIHNct9m6cJR/g1j6FpAFnSVRgMQCmy7Jg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OJeEYQvJoPBU8Lroxmms8XZdFyzJxGdCqVaFY+I01VqQSB4C3KKz5Rd1Hc0DhCR3IS3Fo5SlmIsvbFot4zuxD0EKdqBBIqaVoKlzgS/Mx1baLm39tH/ozK+G58MmwsRbN6hXVI1EPPYKx2LV88/zF9pMi7cwciz7sR3id3Ku/IUfF3a9Xw8F6aw5b64jiSdPG95Pvg43bmHjtyXOKUsFK7XwAcUqrOKYoViZ5p/5j1bfmN88zvvFbrKLLdniFPbVg6YznKAKzVTpuaNe5OXqJ+nFuGDOCRKqk069fRVBoa07ORsBYXaC8VhQg4/Kml98klkAM6RarQJZIX0TdEHPug==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "Xia, Hongyan" <hongyxia@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 26 Jul 2022 06:15:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.07.2022 20:54, Andrew Cooper wrote:
> On 25/07/2022 14:10, Jan Beulich wrote:
>> Quite obviously to determine the split condition successive pages'
>> attributes need to be evaluated, not always those of the initial page.
>>
>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap 
>> init")
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>> Split init_heap_pages() in two"), but there it was still benign.
> 
> This also fixes the crash that XenRT found on loads of hardware, which
> looks something like:
> 
> (XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000
> (XEN) NUMA: Using 8 for the hash shift.
> (XEN) Early fatal page fault at e008:ffff82d04022ae1e
> (cr2=00000000000000b8, ec=0002)
> (XEN) ----[ Xen-4.17.0  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d04022ae1e>]
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04022ae1e>] R
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> (XEN)    [<ffff82d04022dd64>] F
> common/page_alloc.c#init_heap_pages+0x55f/0x720
> (XEN)    [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7
> (XEN)    [<ffff82d040452337>] F __start_xen+0x1a06/0x2779
> (XEN)    [<ffff82d040204344>] F __high_start+0x94/0xa0
> 
> Debugging shows that it's always a block which crosses node 0 and 1,
> where avail[1] has yet to be initialised.
> 
> What I'm confused by is how this manages to manifest broken swiotlb
> issues without Xen crashing.

I didn't debug this in detail since I had managed to spot the issue
by staring at the offending patch, but from the observations some
of node 1's memory was actually accounted to node 0 (incl off-by-
65535 node_need_scrub[] values for both nodes), so I would guess
avail[1] simply wasn't accessed before being set up in my case.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.