[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH 08/37] xen/x86: add detection of discontinous node memory range



On Fri, 24 Sep 2021, Wei Chen wrote:
> > -----Original Message-----
> > From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > Sent: 2021年9月24日 8:26
> > To: Wei Chen <Wei.Chen@xxxxxxx>
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; sstabellini@xxxxxxxxxx; julien@xxxxxxx;
> > Bertrand Marquis <Bertrand.Marquis@xxxxxxx>; jbeulich@xxxxxxxx;
> > andrew.cooper3@xxxxxxxxxx; roger.pau@xxxxxxxxxx; wl@xxxxxxx
> > Subject: Re: [PATCH 08/37] xen/x86: add detection of discontinous node
> > memory range
> > 
> > CC'ing x86 maintainers
> > 
> > On Thu, 23 Sep 2021, Wei Chen wrote:
> > > One NUMA node may contain several memory blocks. In current Xen
> > > code, Xen will maintain a node memory range for each node to cover
> > > all its memory blocks. But here comes the problem, in the gap of
> > > one node's two memory blocks, if there are some memory blocks don't
> > > belong to this node (remote memory blocks). This node's memory range
> > > will be expanded to cover these remote memory blocks.
> > >
> > > One node's memory range contains othe nodes' memory, this is obviously
> > > not very reasonable. This means current NUMA code only can support
> > > node has continous memory blocks. However, on a physical machine, the
> > > addresses of multiple nodes can be interleaved.
> > >
> > > So in this patch, we add code to detect discontinous memory blocks
> > > for one node. NUMA initializtion will be failed and error messages
> > > will be printed when Xen detect such hardware configuration.
> > 
> > At least on ARM, it is not just memory that can be interleaved, but also
> > MMIO regions. For instance:
> > 
> > node0 bank0 0-0x1000000
> > MMIO 0x1000000-0x1002000
> > Hole 0x1002000-0x2000000
> > node0 bank1 0x2000000-0x3000000
> > 
> > So I am not familiar with the SRAT format, but I think on ARM the check
> > would look different: we would just look for multiple memory ranges
> > under a device_type = "memory" node of a NUMA node in device tree.
> > 
> > 
> 
> Should I need to include/refine above message to commit log?

Let me ask you a question first.

With the NUMA implementation of this patch series, can we deal with
cases where each node has multiple memory banks, not interleaved?
An an example:

node0: 0x0        - 0x10000000
MMIO : 0x10000000 - 0x20000000
node0: 0x20000000 - 0x30000000
MMIO : 0x30000000 - 0x50000000
node1: 0x50000000 - 0x60000000
MMIO : 0x60000000 - 0x80000000
node2: 0x80000000 - 0x90000000


I assume we can deal with this case simply by setting node0 memory to
0x0-0x30000000 even if there is actually something else, a device, that
doesn't belong to node0 in between the two node0 banks?

Is it only other nodes' memory interleaved that cause issues? In other
words, only the following is a problematic scenario?

node0: 0x0        - 0x10000000
MMIO : 0x10000000 - 0x20000000
node1: 0x20000000 - 0x30000000
MMIO : 0x30000000 - 0x50000000
node0: 0x50000000 - 0x60000000

Because node1 is in between the two ranges of node0?


I am asking these questions because it is certainly possible to have
multiple memory ranges for each NUMA node in device tree, either by
specifying multiple ranges with a single "reg" property, or by
specifying multiple memory nodes with the same numa-node-id.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.