[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Haozhong Zhang <haozhong.zhang@xxxxxxxxx>
Date: Wed, 2 Mar 2016 15:14:52 +0800
Cc: Juergen Gross <JGross@xxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Wed, 02 Mar 2016 07:14:58 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
Mail-followup-to: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, "Stefano Stabellini" <stefano.stabellini@xxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <JGross@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>

On 03/01/16 13:49, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 01, 2016 at 06:33:32PM +0000, Ian Jackson wrote:
> > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM 
> > support for Xen"):
> > > On 02/18/16 21:14, Konrad Rzeszutek Wilk wrote:
> > > > [someone:]
> > > > > (2) For XENMAPSPACE_gmfn, _gmfn_range and _gmfn_foreign,
> > > > >    (a) never map idx in them to GFNs occupied by vNVDIMM, and
> > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > > > 
> > > > Would that mean that guest xen-blkback or xen-netback wouldn't
> > > > be able to fetch data from the GFNs? As in, what if the HVM guest
> > > > that has the NVDIMM also serves as a device domain - that is it
> > > > has xen-blkback running to service other guests?
> > > 
> > > I'm not familiar with xen-blkback and xen-netback, so following
> > > statements maybe wrong.
> > > 
> > > In my understanding, xen-blkback/-netback in a device domain maps the
> > > pages from other domains into its own domain, and copies data between
> > > those pages and vNVDIMM. The access to vNVDIMM is performed by NVDIMM
> > > driver in device domain. In which steps of this procedure that
> > > xen-blkback/-netback needs to map into GFNs of vNVDIMM?
> > 
> > I think I agree with what you are saying.  I don't understand exactly
> > what you are proposing above in XENMAPSPACE_gmfn but I don't see how
> > anything about this would interfere with blkback.
> > 
> > blkback when talking to an nvdimm will just go through the block layer
> > front door, and do a copy, I presume.
> 
> I believe you are right. The block layer, and then the fs would copy in.
> > 
> > I don't see how netback comes into it at all.
> > 
> > But maybe I am just confused or ignorant!  Please do explain :-).
> 
> s/back/frontend/  
> 
> My fear was refcounting.
> 
> Specifically where we do not do copying. For example, you could
> be sending data from the NVDIMM GFNs (scp?) to some other location
> (another host?). It would go over the xen-netback (in the dom0)
> - which would then grant map it (dom0 would).
>

Thanks for the explanation!

It means NVDIMM is very possibly mapped in page granularity, and
hypervisor needs per-page data structures like page_info (rather than the
range set style nvdimm_pages) to manage those mappings.

Then we will face the problem that the potentially huge number of
per-page data structures may not fit in the normal ram. Linux kernel
developers came across the same problem, and their solution is to
reserve an area of NVDIMM and put the page structures in the reserved
area (https://lwn.net/Articles/672457/). I think we may take the similar
solution:
(1) Dom0 Linux kernel reserves an area on each NVDIMM for Xen usage
    (besides the one used by Linux kernel itself) and reports the address
    and size to Xen hypervisor.

    Reasons to choose Linux kernel to make the reservation include:
    (a) only Dom0 Linux kernel has the NVDIMM driver,
    (b) make it flexible for Dom0 Linux kernel to handle all
        reservations (for itself and Xen).

(2) Then Xen hypervisor builds the page structures for NVDIMM pages and
    stores them in above reserved areas.

(3) The reserved area is used as volatile, i.e. above two steps must be
    done for every host boot.

> In effect Xen there are two guests (dom0 and domU) pointing in the
> P2M to the same GPFN. And that would mean:
> 
> > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> 
> Granted the XENMAPSPACE_gmfn happens _before_ the grant mapping is done
> so perhaps this is not an issue?
> 
> The other situation I was envisioning - where the driver domain has
> the NVDIMM passed in, and as well SR-IOV network card and functions
> as an iSCSI target. That should work OK as we just need the IOMMU
> to have the NVDIMM GPFNs programmed in.
>

For this IOMMU usage example and above granted pages example, there
remains one question: who is responsible to perform NVDIMM flush
(clwb/clflushopt/pcommit)?

For the granted page example, if a NVDIMM page is granted to
xen-netback, does the hypervisor need to tell xen-netback it's a NVDIMM
page so that xen-netback can perform proper flush when it writes to that
page? Or we may keep the NVDIMM transparent to xen-netback, and let Xen
perform the flush when xen-netback gives up the granted NVDIMM page?

For the IOMMU example, my understanding is that there is a piece of
software in the driver domain that handles SCSI commands received from
network card and drives the network card to read/write certain areas of
NVDIMM. Then that software should be aware of the existence of NVDIMM
and perform the flush properly. Is that right?

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Jan Beulich

References:
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Haozhong Zhang
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Ian Jackson
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
Next by Date: [Xen-devel] pv-grub guest booting fail with recent qemu-xen
Previous by thread: Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
Next by thread: Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.