[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: x86 NUMA error on OSSTest box


  • To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 4 Oct 2022 10:04:52 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PJfDgo1stwc+B/8QrVp+aOsZxzWQCQsFoyG42hsukcs=; b=TYwEOAV77smVgQgkr4jVszNlcjvZ76KzQM5wmF0MhH8eV82ChsJ6p/9lI/xhrI/jRaTdABy5hkhmIy3Cs9pw5Qod0yqGlvRk8TcnACtzzYPmhwhdxENKijii5Co754uinKI34AHtdAjYLHK0P7GYG+++rZcqDQPRhZMDuDYhn0+sRROUH4AKtg8b16KL68Fkl2FJkWKcE3VHT+mRsa6LRMg0SwN4XpNYcUF3H9Ivtzv9EJug8BSi9+cIhpRFNwENA6na+veByyA/xPF6g/AM2Uf3MYvJGqqjot5vXVmiwqHQI7n26IhM5gEyrM8BjJ1sFLPjnJnSPcwx/KkwEFV+TA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oftjpAYp+Q9GuUVgn4tsj4293hGyomNfHKGiVgNxzZE/MO07T4ulEXqt3s+YlTYQ1aS/svRM/enPTSqhd40w0pWkin5tuAed3EbY4qNk4HULH52SXXnK5OzIOZhOOFSthbXhTUSilf4Nr0BlTGdQJOj4JKSkMYyktuhZmR4WMfFNVdz/amTOdAL7ijxVZGfnDx3f+jYRCrLDekxGXrLlyHHqL0OCgt3OS3SijsELn0jCCtqicyg+sCQoI2WfHrYu904FpIhhEoC5Scb00NpFY0YiPctmSVcJNZ080NjDQl7ak3i/ad2cvfe4cU27K1s8IAmua3ZQAeGGPw6C8O5k8A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Henry Wang <Henry.Wang@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 04 Oct 2022 08:05:10 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.10.2022 23:21, Andrew Cooper wrote:
> While working on another issue, I spotted this:
> 
> (XEN) ACPI: EINJ 6CB9D638, 0150 (r1 ORACLE     X7-2 41060300 INTL        1)
> (XEN) System RAM: 32429MB (33208204kB)
> (XEN) SRAT: Node 0 PXM 0 [0000000000000000, 000000007fffffff]
> (XEN) SRAT: Node 0 PXM 0 [0000000100000000, 000000047fffffff]
> (XEN) SRAT: Node 1 PXM 1 [0000000480000000, 000000087fffffff]
> (XEN) NUMA: Using 19 for the hash shift.
> (XEN) Your memory is not aligned you need to rebuild your hypervisor
> with a bigger NODEMAPSIZE shift=19
> (XEN) SRAT: No NUMA node hash function found. Contact maintainer
> (XEN) SRAT: SRAT not used.
> (XEN) No NUMA configuration found
> (XEN) Faking a node at 0000000000000000-0000000880000000
> (XEN) Domain heap initialised
> 
> on sabro0 in OSSTest on current staging.  I do not know if it's a recent
> regression or not.
> 
> The SRAT looks reasonable (in fact, far better than most I've seen). 
> Given no legitimate requirement for aligned memory that I'm aware of, I
> think Xen's behaviour here is buggy and wants resolving.

Judging from flight 173273's logs (on sabro1) this is a recent issue,
which then must result from one of my changes. There we simply have

Sep 22 01:54:39.843438 (XEN) SRAT: Node 0 PXM 0 [0000000000000000, 
000000007fffffff]
Sep 22 01:54:39.915465 (XEN) SRAT: Node 0 PXM 0 [0000000100000000, 
000000047fffffff]
Sep 22 01:54:39.927478 (XEN) SRAT: Node 1 PXM 1 [0000000480000000, 
000000087fffffff]
Sep 22 01:54:39.927500 (XEN) NUMA: Using 19 for the hash shift.

For the moment I can't make the connection, as we still pick 19 for the
shift value. I'll take a closer look.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.