[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Haozhong Zhang <haozhong.zhang@xxxxxxxxx>
Date: Tue, 8 Mar 2016 13:50:11 +0800
Cc: Juergen Gross <JGross@xxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Tue, 08 Mar 2016 05:50:36 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
Mail-followup-to: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, "Stefano Stabellini" <stefano.stabellini@xxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <JGross@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>

On 03/07/16 15:53, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 02, 2016 at 03:14:52PM +0800, Haozhong Zhang wrote:
> > On 03/01/16 13:49, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Mar 01, 2016 at 06:33:32PM +0000, Ian Jackson wrote:
> > > > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM 
> > > > support for Xen"):
> > > > > On 02/18/16 21:14, Konrad Rzeszutek Wilk wrote:
> > > > > > [someone:]
> > > > > > > (2) For XENMAPSPACE_gmfn, _gmfn_range and _gmfn_foreign,
> > > > > > >    (a) never map idx in them to GFNs occupied by vNVDIMM, and
> > > > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > > > > > 
> > > > > > Would that mean that guest xen-blkback or xen-netback wouldn't
> > > > > > be able to fetch data from the GFNs? As in, what if the HVM guest
> > > > > > that has the NVDIMM also serves as a device domain - that is it
> > > > > > has xen-blkback running to service other guests?
> > > > > 
> > > > > I'm not familiar with xen-blkback and xen-netback, so following
> > > > > statements maybe wrong.
> > > > > 
> > > > > In my understanding, xen-blkback/-netback in a device domain maps the
> > > > > pages from other domains into its own domain, and copies data between
> > > > > those pages and vNVDIMM. The access to vNVDIMM is performed by NVDIMM
> > > > > driver in device domain. In which steps of this procedure that
> > > > > xen-blkback/-netback needs to map into GFNs of vNVDIMM?
> > > > 
> > > > I think I agree with what you are saying.  I don't understand exactly
> > > > what you are proposing above in XENMAPSPACE_gmfn but I don't see how
> > > > anything about this would interfere with blkback.
> > > > 
> > > > blkback when talking to an nvdimm will just go through the block layer
> > > > front door, and do a copy, I presume.
> > > 
> > > I believe you are right. The block layer, and then the fs would copy in.
> > > > 
> > > > I don't see how netback comes into it at all.
> > > > 
> > > > But maybe I am just confused or ignorant!  Please do explain :-).
> > > 
> > > s/back/frontend/  
> > > 
> > > My fear was refcounting.
> > > 
> > > Specifically where we do not do copying. For example, you could
> > > be sending data from the NVDIMM GFNs (scp?) to some other location
> > > (another host?). It would go over the xen-netback (in the dom0)
> > > - which would then grant map it (dom0 would).
> > >
> > 
> > Thanks for the explanation!
> > 
> > It means NVDIMM is very possibly mapped in page granularity, and
> > hypervisor needs per-page data structures like page_info (rather than the
> > range set style nvdimm_pages) to manage those mappings.
> 
> I do not know. I figured you need some accounting in the hypervisor
> as the pages can be grant mapped but I don't know the intricate details
> of the P2M code to tell you for certain.
> 
> [edit: Your later email seems to imply that you do not need all this
> information? Just ranges?]

Not quite sure which one do you mean. But at least in this example,
NVDIMM can be granted in the unit of page, so I think Xen still needs
per-page data structure to track this mapping information and range
structure is not enough.

> > 
> > Then we will face the problem that the potentially huge number of
> > per-page data structures may not fit in the normal ram. Linux kernel
> > developers came across the same problem, and their solution is to
> > reserve an area of NVDIMM and put the page structures in the reserved
> > area (https://lwn.net/Articles/672457/). I think we may take the similar
> > solution:
> > (1) Dom0 Linux kernel reserves an area on each NVDIMM for Xen usage
> >     (besides the one used by Linux kernel itself) and reports the address
> >     and size to Xen hypervisor.
> > 
> >     Reasons to choose Linux kernel to make the reservation include:
> >     (a) only Dom0 Linux kernel has the NVDIMM driver,
> >     (b) make it flexible for Dom0 Linux kernel to handle all
> >         reservations (for itself and Xen).
> > 
> > (2) Then Xen hypervisor builds the page structures for NVDIMM pages and
> >     stores them in above reserved areas.
> > 
> > (3) The reserved area is used as volatile, i.e. above two steps must be
> >     done for every host boot.
> > 
> > > In effect Xen there are two guests (dom0 and domU) pointing in the
> > > P2M to the same GPFN. And that would mean:
> > > 
> > > > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > > 
> > > Granted the XENMAPSPACE_gmfn happens _before_ the grant mapping is done
> > > so perhaps this is not an issue?
> > > 
> > > The other situation I was envisioning - where the driver domain has
> > > the NVDIMM passed in, and as well SR-IOV network card and functions
> > > as an iSCSI target. That should work OK as we just need the IOMMU
> > > to have the NVDIMM GPFNs programmed in.
> > >
> > 
> > For this IOMMU usage example and above granted pages example, there
> > remains one question: who is responsible to perform NVDIMM flush
> > (clwb/clflushopt/pcommit)?
> 
> 
> > 
> > For the granted page example, if a NVDIMM page is granted to
> > xen-netback, does the hypervisor need to tell xen-netback it's a NVDIMM
> > page so that xen-netback can perform proper flush when it writes to that
> > page? Or we may keep the NVDIMM transparent to xen-netback, and let Xen
> > perform the flush when xen-netback gives up the granted NVDIMM page?
> > 
> > For the IOMMU example, my understanding is that there is a piece of
> > software in the driver domain that handles SCSI commands received from
> > network card and drives the network card to read/write certain areas of
> > NVDIMM. Then that software should be aware of the existence of NVDIMM
> > and perform the flush properly. Is that right?
> 
> I would imagine it is the same as any write on NVDIMM. The "owner"
> of the NVDIMM would perform the pcommit. ?

Agree, software accessing NVDIMM is responsible to perform proper flush.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Haozhong Zhang
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Ian Jackson
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Haozhong Zhang
- Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
  - From: Konrad Rzeszutek Wilk

Prev by Date: [Xen-devel] Storage Domain in Xen 4.4
Next by Date: [Xen-devel] [ovmf test] 85663: regressions - FAIL
Previous by thread: Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
Next by thread: Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.