[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Draft NVDIMM proposal




> On May 15, 2018, at 1:26 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> 
>>>> On 15.05.18 at 12:12, <George.Dunlap@xxxxxxxxxx> wrote:
>>> On May 15, 2018, at 11:05 AM, Roger Pau Monne <roger.pau@xxxxxxxxxx> wrote:
>>> On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote:
>>>> [ adding linux-nvdimm ]
>>>> 
>>>> Great write up! Some comments below...
>>>> 
>>>> On Wed, May 9, 2018 at 10:35 AM, George Dunlap <george.dunlap@xxxxxxxxxx> 
>>>> wrote:
>>>>>> To use a namespace, an operating system needs at a minimum two pieces
>>>>>> of information: The UUID and/or Name of the namespace, and the SPA
>>>>>> range where that namespace is mapped; and ideally also the Type and
>>>>>> Abstraction Type to know how to interpret the data inside.
>>>> 
>>>> Not necessarily, no. Linux supports "label-less" mode where it exposes
>>>> the raw capacity of a region in 1:1 mapped namespace without a label.
>>>> This is how Linux supports "legacy" NVDIMMs that do not support
>>>> labels.
>>> 
>>> In that case, how does Linux know which area of the NVDIMM it should
>>> use to store the page structures?
>> 
>> The answer to that is right here:
>> 
>>>>>> `fsdax` and `devdax` mode are both designed to make it possible for
>>>>>> user processes to have direct mapping of NVRAM.  As such, both are
>>>>>> only suitable for PMEM namespaces (?).  Both also need to have kernel
>>>>>> page structures allocated for each page of NVRAM; this amounts to 64
>>>>>> bytes for every 4k of NVRAM.  Memory for these page structures can
>>>>>> either be allocated out of normal "system" memory, or inside the PMEM
>>>>>> namespace itself.
>>>>>> 
>>>>>> In both cases, an "info block", very similar to the BTT info block, is
>>>>>> written to the beginning of the namespace when created.  This info
>>>>>> block specifies whether the page structures come from system memory or
>>>>>> from the namespace itself.  If from the namespace itself, it contains
>>>>>> information about what parts of the namespace have been set aside for
>>>>>> Linux to use for this purpose.
>> 
>> That is, each fsdax / devdax namespace has a superblock that, in part, 
>> defines what parts are used for Linux and what parts are used for data.  Or 
>> to put it a different way: Linux decides which parts of a namespace to use 
>> for page structures, and writes it down in the metadata starting in the 
>> first 
>> page of the namespace.
> 
> And that metadata layout is agreed upon between all OS vendors?
> 
>>>>>> Linux has also defined "Type GUIDs" for these two types of namespace
>>>>>> to be stored in the namespace label, although these are not yet in the
>>>>>> ACPI spec.
>>>> 
>>>> They never will be. One of the motivations for GUIDs is that an OS can
>>>> define private ones without needing to go back and standardize them.
>>>> Only GUIDs that are needed to inter-OS / pre-OS compatibility would
>>>> need to be defined in ACPI, and there is no expectation that other
>>>> OSes understand Linux's format for reserving page structure space.
>>> 
>>> Maybe it would be helpful to somehow mark those areas as
>>> "non-persistent" storage, so that other OSes know they can use this
>>> space for temporary data that doesn't need to survive across reboots?
>> 
>> In theory there’s no reason another OS couldn’t learn Linux’s format, 
>> discover where the blocks were, and use those blocks for its own purposes 
>> while Linux wasn’t running.
> 
> This looks to imply "no" to my question above, in which case I wonder how
> we would use (part of) the space when the "other" owner is e.g. Windows.

So in classic DOS partition tables, you have partition types; and various 
operating systems just sort of “claimed” numbers for themselves (e.g., NTFS, 
Linux Swap, &c).  

But the DOS partition table number space is actually quite small.  So in 
namespaces, you have a similar concept, except that it’s called a “type GUID”, 
and it’s massively long — long enough anyone who wants to make a new type can 
simply generate one randomly and be pretty confident that nobody else is using 
that one.

So if the labels contain a TGUID you understand, you use it, just like you 
would a partition that you understand.  If it contains GUIDs you don’t 
understand, you’d better leave it alone.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.