[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
On 03/08/16 02:27, Jan Beulich wrote: > >>> On 08.03.16 at 10:15, <haozhong.zhang@xxxxxxxxx> wrote: > > More thoughts on reserving NVDIMM space for per-page structures > > > > Currently, a per-page struct for managing mapping of NVDIMM pages may > > include following fields: > > > > struct nvdimm_page > > { > > uint64_t mfn; /* MFN of SPA of this NVDIMM page */ > > uint64_t gfn; /* GFN where this NVDIMM page is mapped */ > > domid_t domain_id; /* which domain is this NVDIMM page mapped to */ > > int is_broken; /* Is this NVDIMM page broken? (for MCE) */ > > } > > > > Its size is 24 bytes (or 22 bytes if packed). For a 2 TB NVDIMM, > > nvdimm_page structures would occupy 12 GB space, which is too hard to > > fit in the normal ram on a small memory host. However, for smaller > > NVDIMMs and/or hosts with large ram, those structures may still be able > > to fit in the normal ram. In the latter circumstance, nvdimm_page > > structures are stored in the normal ram, so they can be accessed more > > quickly. > > Not sure how you came to the above structure - it's the first time > I see it, yet figuring out what information it needs to hold is what > this design process should be about. For example, I don't see why > it would need to duplicate M2P / P2M information. Nor do I see why > per-page data needs to hold the address of a page (struct > page_info also doesn't). And whether storing a domain ID (rather > than a pointer to struct domain, as in struct page_info) is the > correct think is also to be determined (rather than just stated). > > Otoh you make no provisions at all for any kind of ref counting. > What if a guest wants to put page tables into NVDIMM space? > > Since all of your calculations are based upon that fixed assumption > on the structure layout, I'm afraid they're not very meaningful > without first settling on what data needs tracking in the first place. > > Jan > I should reexplain the choice of data structures and where to put them. For handling MCE for NVDIMM, we need to track following data: (1) SPA ranges of host NVDIMMs (one range per pmem interleave set), which are used to check whether a MCE is for NVDIMM. (2) GFN to which a NVDIMM page is mapped, which is used to determine the address put in vMCE. (3) the domain to which a NVDIMM page is mapped, which is used to determine whether a vMCE needs to be injected and where it will be injected. (4) a flag to mark whether a NVDIMM page is broken, which is used to avoid mapping broken page to guests. For granting NVDIMM pages (e.g. xen-blkback/netback), (5) a reference counter is needed for each NVDIMM page Above data can be organized as below: * For (1) SPA ranges, we can record them in a global data structure, e.g. a list struct list_head nvdimm_iset_list; struct nvdimm_iset { uint64_t base; /* starting SPA of this interleave set */ uint64_t size; /* size of this interleave set */ struct nvdimm_page *pages;/* information for individual pages in this interleave set */ struct list_head list; }; * For (2) GFN, an intuitive place to get this information is from M2P table machine_to_phys_mapping[]. However, the address of NVDIMM is not required to be contiguous with normal ram, so, if NVDIMM starts from an address that is much higher than the end address of normal ram, it may result in a M2P table that maybe too large to fit in the normal ram. Therefore, we choose to not put GFNs of NVDIMM in M2P table. Another possible solution is to extend page_info to include GFN for NVDIMM and use frame_table. A benefit of this solution is that other data (3)-(5) can be got from page_info as well. However, due to the same reason for machine_to_phys_mapping[] and the concern that the large number of page_info structures required for large NVDIMMs may consume lots of ram, page_info and frame_table seems not a good place either. * At the end, we choose to introduce a new data structure for above per-page data (2)-(5) struct nvdimm_page { struct domain *domain; /* for (3) */ uint64_t gfn; /* for (2) */ unsigned long count_info; /* for (4) and (5), same as page_info->count_info */ /* other fields if needed, e.g. lock */ } (MFN is not needed indeed) On each NVDIMM interleave set, we could reserve an area to place an array of nvdimm_page structures for pages in that interleave set. In addition, the corresponding global nvdimm_iset structure is set to point to this array via its 'pages' field. * One disadvantage of above solution is that accessing NVDIMM is slower than normal ram, so some usage scenarios that requires frequently accesses to nvdimm_page structures may suffer poor performance. Therefore, we may add a boot parameter to allow users to choose normal ram for above nvdimm_page arrays if their hosts have plenty ram. One thing I have no idea is what percentage of ram used/reserved by Xen itself is considered as acceptable. If it exists and a boot parameter is given, we could let Xen choose the faster ram when the percentage has not been reached. Any comments? Thanks, Haozhong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |