[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
On 03/07/16 15:53, Konrad Rzeszutek Wilk wrote: > On Wed, Mar 02, 2016 at 03:14:52PM +0800, Haozhong Zhang wrote: > > On 03/01/16 13:49, Konrad Rzeszutek Wilk wrote: > > > On Tue, Mar 01, 2016 at 06:33:32PM +0000, Ian Jackson wrote: > > > > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM > > > > support for Xen"): > > > > > On 02/18/16 21:14, Konrad Rzeszutek Wilk wrote: > > > > > > [someone:] > > > > > > > (2) For XENMAPSPACE_gmfn, _gmfn_range and _gmfn_foreign, > > > > > > > (a) never map idx in them to GFNs occupied by vNVDIMM, and > > > > > > > (b) never map idx corresponding to GFNs occupied by vNVDIMM > > > > > > > > > > > > Would that mean that guest xen-blkback or xen-netback wouldn't > > > > > > be able to fetch data from the GFNs? As in, what if the HVM guest > > > > > > that has the NVDIMM also serves as a device domain - that is it > > > > > > has xen-blkback running to service other guests? > > > > > > > > > > I'm not familiar with xen-blkback and xen-netback, so following > > > > > statements maybe wrong. > > > > > > > > > > In my understanding, xen-blkback/-netback in a device domain maps the > > > > > pages from other domains into its own domain, and copies data between > > > > > those pages and vNVDIMM. The access to vNVDIMM is performed by NVDIMM > > > > > driver in device domain. In which steps of this procedure that > > > > > xen-blkback/-netback needs to map into GFNs of vNVDIMM? > > > > > > > > I think I agree with what you are saying. I don't understand exactly > > > > what you are proposing above in XENMAPSPACE_gmfn but I don't see how > > > > anything about this would interfere with blkback. > > > > > > > > blkback when talking to an nvdimm will just go through the block layer > > > > front door, and do a copy, I presume. > > > > > > I believe you are right. The block layer, and then the fs would copy in. > > > > > > > > I don't see how netback comes into it at all. > > > > > > > > But maybe I am just confused or ignorant! Please do explain :-). > > > > > > s/back/frontend/ > > > > > > My fear was refcounting. > > > > > > Specifically where we do not do copying. For example, you could > > > be sending data from the NVDIMM GFNs (scp?) to some other location > > > (another host?). It would go over the xen-netback (in the dom0) > > > - which would then grant map it (dom0 would). > > > > > > > Thanks for the explanation! > > > > It means NVDIMM is very possibly mapped in page granularity, and > > hypervisor needs per-page data structures like page_info (rather than the > > range set style nvdimm_pages) to manage those mappings. > > I do not know. I figured you need some accounting in the hypervisor > as the pages can be grant mapped but I don't know the intricate details > of the P2M code to tell you for certain. > > [edit: Your later email seems to imply that you do not need all this > information? Just ranges?] Not quite sure which one do you mean. But at least in this example, NVDIMM can be granted in the unit of page, so I think Xen still needs per-page data structure to track this mapping information and range structure is not enough. > > > > Then we will face the problem that the potentially huge number of > > per-page data structures may not fit in the normal ram. Linux kernel > > developers came across the same problem, and their solution is to > > reserve an area of NVDIMM and put the page structures in the reserved > > area (https://lwn.net/Articles/672457/). I think we may take the similar > > solution: > > (1) Dom0 Linux kernel reserves an area on each NVDIMM for Xen usage > > (besides the one used by Linux kernel itself) and reports the address > > and size to Xen hypervisor. > > > > Reasons to choose Linux kernel to make the reservation include: > > (a) only Dom0 Linux kernel has the NVDIMM driver, > > (b) make it flexible for Dom0 Linux kernel to handle all > > reservations (for itself and Xen). > > > > (2) Then Xen hypervisor builds the page structures for NVDIMM pages and > > stores them in above reserved areas. > > > > (3) The reserved area is used as volatile, i.e. above two steps must be > > done for every host boot. > > > > > In effect Xen there are two guests (dom0 and domU) pointing in the > > > P2M to the same GPFN. And that would mean: > > > > > > > > > > (b) never map idx corresponding to GFNs occupied by vNVDIMM > > > > > > Granted the XENMAPSPACE_gmfn happens _before_ the grant mapping is done > > > so perhaps this is not an issue? > > > > > > The other situation I was envisioning - where the driver domain has > > > the NVDIMM passed in, and as well SR-IOV network card and functions > > > as an iSCSI target. That should work OK as we just need the IOMMU > > > to have the NVDIMM GPFNs programmed in. > > > > > > > For this IOMMU usage example and above granted pages example, there > > remains one question: who is responsible to perform NVDIMM flush > > (clwb/clflushopt/pcommit)? > > > > > > For the granted page example, if a NVDIMM page is granted to > > xen-netback, does the hypervisor need to tell xen-netback it's a NVDIMM > > page so that xen-netback can perform proper flush when it writes to that > > page? Or we may keep the NVDIMM transparent to xen-netback, and let Xen > > perform the flush when xen-netback gives up the granted NVDIMM page? > > > > For the IOMMU example, my understanding is that there is a piece of > > software in the driver domain that handles SCSI commands received from > > network card and drives the network card to read/write certain areas of > > NVDIMM. Then that software should be aware of the existence of NVDIMM > > and perform the flush properly. Is that right? > > I would imagine it is the same as any write on NVDIMM. The "owner" > of the NVDIMM would perform the pcommit. ? Agree, software accessing NVDIMM is responsible to perform proper flush. Haozhong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |