Xen project Mailing List

Re: [Xen-devel] Draft NVDIMM proposal

From: George Dunlap <George.Dunlap@xxxxxxxxxx>

Date: Tue, 15 May 2018 13:05:18 +0000

Accept-language: en-GB, en-US

Cc: "linux-nvdimm@xxxxxxxxxxxx" <linux-nvdimm@xxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "dan.j.williams@xxxxxxxxx" <dan.j.williams@xxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, "yi.z.zhang@xxxxxxxxx" <yi.z.zhang@xxxxxxxxx>

Delivery-date: Tue, 15 May 2018 13:06:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHT57tSTZr9QeNoPk6sp7lLfTqHqKQnl+eAgAMCmgCABd0jgIAAAdSAgAAldgCAAArRgA==

Thread-topic: Draft NVDIMM proposal

> On May 15, 2018, at 1:26 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote: > >>>> On 15.05.18 at 12:12, <George.Dunlap@xxxxxxxxxx> wrote: >>> On May 15, 2018, at 11:05 AM, Roger Pau Monne <roger.pau@xxxxxxxxxx> wrote: >>> On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote: >>>> [ adding linux-nvdimm ] >>>> >>>> Great write up! Some comments below... >>>> >>>> On Wed, May 9, 2018 at 10:35 AM, George Dunlap <george.dunlap@xxxxxxxxxx> >>>> wrote: >>>>>> To use a namespace, an operating system needs at a minimum two pieces >>>>>> of information: The UUID and/or Name of the namespace, and the SPA >>>>>> range where that namespace is mapped; and ideally also the Type and >>>>>> Abstraction Type to know how to interpret the data inside. >>>> >>>> Not necessarily, no. Linux supports "label-less" mode where it exposes >>>> the raw capacity of a region in 1:1 mapped namespace without a label. >>>> This is how Linux supports "legacy" NVDIMMs that do not support >>>> labels. >>> >>> In that case, how does Linux know which area of the NVDIMM it should >>> use to store the page structures? >> >> The answer to that is right here: >> >>>>>> `fsdax` and `devdax` mode are both designed to make it possible for >>>>>> user processes to have direct mapping of NVRAM. As such, both are >>>>>> only suitable for PMEM namespaces (?). Both also need to have kernel >>>>>> page structures allocated for each page of NVRAM; this amounts to 64 >>>>>> bytes for every 4k of NVRAM. Memory for these page structures can >>>>>> either be allocated out of normal "system" memory, or inside the PMEM >>>>>> namespace itself. >>>>>> >>>>>> In both cases, an "info block", very similar to the BTT info block, is >>>>>> written to the beginning of the namespace when created. This info >>>>>> block specifies whether the page structures come from system memory or >>>>>> from the namespace itself. If from the namespace itself, it contains >>>>>> information about what parts of the namespace have been set aside for >>>>>> Linux to use for this purpose. >> >> That is, each fsdax / devdax namespace has a superblock that, in part, >> defines what parts are used for Linux and what parts are used for data. Or >> to put it a different way: Linux decides which parts of a namespace to use >> for page structures, and writes it down in the metadata starting in the >> first >> page of the namespace. > > And that metadata layout is agreed upon between all OS vendors? > >>>>>> Linux has also defined "Type GUIDs" for these two types of namespace >>>>>> to be stored in the namespace label, although these are not yet in the >>>>>> ACPI spec. >>>> >>>> They never will be. One of the motivations for GUIDs is that an OS can >>>> define private ones without needing to go back and standardize them. >>>> Only GUIDs that are needed to inter-OS / pre-OS compatibility would >>>> need to be defined in ACPI, and there is no expectation that other >>>> OSes understand Linux's format for reserving page structure space. >>> >>> Maybe it would be helpful to somehow mark those areas as >>> "non-persistent" storage, so that other OSes know they can use this >>> space for temporary data that doesn't need to survive across reboots? >> >> In theory there’s no reason another OS couldn’t learn Linux’s format, >> discover where the blocks were, and use those blocks for its own purposes >> while Linux wasn’t running. > > This looks to imply "no" to my question above, in which case I wonder how > we would use (part of) the space when the "other" owner is e.g. Windows. So in classic DOS partition tables, you have partition types; and various operating systems just sort of “claimed” numbers for themselves (e.g., NTFS, Linux Swap, &c). But the DOS partition table number space is actually quite small. So in namespaces, you have a similar concept, except that it’s called a “type GUID”, and it’s massively long — long enough anyone who wants to make a new type can simply generate one randomly and be pretty confident that nobody else is using that one. So if the labels contain a TGUID you understand, you use it, just like you would a partition that you understand. If it contains GUIDs you don’t understand, you’d better leave it alone. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.