[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC XEN PATCH v4 00/41] Add vNVDIMM support to HVM domains



On 02/12/18 10:05 +0000, Roger Pau Monné wrote:
> On Mon, Feb 12, 2018 at 09:25:42AM +0800, Haozhong Zhang wrote:
> > On 02/09/18 12:33 +0000, Roger Pau Monné wrote:
> > > Thanks for the series, I'm however wondering whether it's appropriate
> > > to post a v4 as RFC. Ie: at v4 the reviewer expects the submitter to
> > > have a clear picture of what needs to be implemented.
> > > 
> > > On Thu, Dec 07, 2017 at 06:09:49PM +0800, Haozhong Zhang wrote:
> > > > All patches can also be found at
> > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v4
> > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v4
> > > > 
> > > > RFC v3 can be found at
> > > >   https://lists.xen.org/archives/html/xen-devel/2017-09/msg00964.html
> > > > 
> > > > Changes in v4:
> > > >   * Move the functionality of management util 'xen-ndctl' to Xne
> > > >     management tool 'xl'.
> > > >   * Load QEMU ACPI via QEMU fw_cfg and BIOSLinkerLoader interface.
> > > >   * Other changes are documented in patches separately.
> > > > 
> > > > 
> > > > - Part 0. Bug fix and code cleanup
> > > >   [01/41] x86_64/mm: fix the PDX group check in mem_hotadd_check()
> > > >   [02/41] x86_64/mm: avoid cleaning the unmapped frame table
> > > >   [03/41] hvmloader/util: do not compare characters after '\0' in 
> > > > strncmp
> > > > 
> > > > - Part 1. Detect host PMEM
> > > >   Detect host PMEM via NFIT. No frametable and M2P table for them are
> > > >   created in this part.
> > > > 
> > > >   [04/41] xen/common: add Kconfig item for pmem support
> > > >   [05/41] x86/mm: exclude PMEM regions from initial frametable
> > > >   [06/41] acpi: probe valid PMEM regions via NFIT
> > > >   [07/41] xen/pmem: register valid PMEM regions to Xen hypervisor
> > > >   [08/41] xen/pmem: hide NFIT and deny access to PMEM from Dom0
> > > 
> > > I'm afraid I might ask stupied questions, since I haven't followed the
> > > design discussion of this series very closely.
> > > 
> > > So you basically hide the NVDIMM from Dom0, and only allow guests to
> > > use it?
> > 
> > Yes, though I have some unsent patches (for vNVDIMM label support) to
> > allow QEMU in dom0 to access NVDIMM via DMOP.
> > 
> > > 
> > > What happens when you boot the same system without Xen? Will the
> > > NVDIMM get corrupted because for example Linux will write something to
> > > it?
> > 
> > Bare metal OS without Xen may write to NVDIMM which may or may not
> > corrupt the data, depending on the existing data on NVDIMM and how
> > other OS uses NVDIMM.
> > 
> > If the bare-metal OS uses NVDIMM, for example, as the volatile memory
> > or the fast disk cache, then the random data may be dumped to NVDIMM
> > and corrupt the existing data.
> > 
> > If the bare-metal OS treats NVDIMM as storage, it may probe certain
> > structures (e.g., file systems) on NVDIMM before further operations
> > and stop if such structures are not probed. In such case, the existing
> > data on NVDIMM will not be corrupted.
> 
> OK. I have to admit my knowledge of NVDIMM is very limited. Is it
> expected to for example partition a NVDIMM into several partitions and
> maybe use one as disk cache and others as storage?
> 
> How would that be accomplished, using GPT for example? Or there's some
> NVDIMM specific way to describe the layout?

NVDIMM is mapped to CPU address space just as regular RAM. Basically
SW can access it via the normal memory access instructions (e.g, mov
on x86) with necessary cache flush operations (e.g, clwb/clflushopt/clflush)
to guarantee the write persistence. Beyond this basic byte-addressable
interface, SW can choose to, for example, use it as the typical
memory, use it as a persistent storage, and even implement a block
interface over it. SW can choose its own method to partition NVDIMM,
maybe via the typical disk partitions and file systems, or the labels
which are provided by NVDIMM.

When those SW runs in a HVM domain, the primary work of Xen is to map
the host NVDIMM address to guest address space in EPT as RW just like
the normal memory virtualization.

> 
> Would it be conceivable to store Dom0 root filesystem in a NVDIMM
> while also using it to provide storage to the guests?

Yes, it's possible, though it's not allowed in this patchset.  We need
to configure Xen hypervisor before booting, to know which part of
NVDIMM is needed to map to Dom0 and where the management structures of
that part of NVDIMM are maintained (e.g., in another part of NVDIM or
in RAM).

Haozhong

> 
> > > 
> > > >   [09/41] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
> > > >   [10/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
> > > >   [11/41] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
> > > >   [12/41] tools/xl: add xl command 'pmem-list'
> > > > 
> > > > - Part 2. Setup host PMEM for management and guest data usage
> > > >   Allow users or admins in Dom0 to setup host PMEM pages for
> > > >   management and guest data usages.
> > > >    * Management PMEM pages are used to store the frametable and M2P of
> > > >      PMEM pages (including themselves), and never mapped to guest.
> > > >    * Guest data PMEM pages can be mapped to guest and used as the
> > > >      backend storage of virtual NVDIMM devices.
> > > 
> > > So this is basically tied to a PV Dom0, but I would like to also think
> > > about what would happen with a PVH Dom0. In that case AFAICT Xen could
> > > map the full NVDIMM to the Dom0 p2m as MMIO using 1GB pages, at which
> > > point Dom0 could manage the NVDIMM as desired? Ie: Dom0 could map
> > > parts of the NVDIMM to DomU as it maps other MMIO regions.
> > 
> > The primary reason I don't want to map NVDIMM to Dom0 (either PV or
> > PVH) is the frame table and M2P table of NVDIMM are maintained on
> > NVDIMM. Because NVDIMM is non-volatile and Xen has no idea of which
> > portion of NVDIMM can be used for frame table and M2P, Xen needs the
> > user input for such information (patch 18, 22, 23) after it boots
> > up. That is, before Xen boots up, Xen cannot determine which portion
> > of NVDIMM for its frame table and M2P that should not map to Dom0.
> 
> If you map the NVDIMM as MMIO to Dom0 you don't need the M2P entries
> IIRC, and if it's mapped using 1GB pages it shouldn't use that much
> memory for the page tables (ie: you could just use normal RAM for the
> page tables that map the NVDIMM IMO). Of course that only applies to
> PVH/HVM.
> 
> Thanks, Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.