Xen project Mailing List

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen

On Wed, Mar 02, 2016 at 03:14:52PM +0800, Haozhong Zhang wrote: > On 03/01/16 13:49, Konrad Rzeszutek Wilk wrote: > > On Tue, Mar 01, 2016 at 06:33:32PM +0000, Ian Jackson wrote: > > > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM > > > support for Xen"): > > > > On 02/18/16 21:14, Konrad Rzeszutek Wilk wrote: > > > > > [someone:] > > > > > > (2) For XENMAPSPACE_gmfn, _gmfn_range and _gmfn_foreign, > > > > > > (a) never map idx in them to GFNs occupied by vNVDIMM, and > > > > > > (b) never map idx corresponding to GFNs occupied by vNVDIMM > > > > > > > > > > Would that mean that guest xen-blkback or xen-netback wouldn't > > > > > be able to fetch data from the GFNs? As in, what if the HVM guest > > > > > that has the NVDIMM also serves as a device domain - that is it > > > > > has xen-blkback running to service other guests? > > > > > > > > I'm not familiar with xen-blkback and xen-netback, so following > > > > statements maybe wrong. > > > > > > > > In my understanding, xen-blkback/-netback in a device domain maps the > > > > pages from other domains into its own domain, and copies data between > > > > those pages and vNVDIMM. The access to vNVDIMM is performed by NVDIMM > > > > driver in device domain. In which steps of this procedure that > > > > xen-blkback/-netback needs to map into GFNs of vNVDIMM? > > > > > > I think I agree with what you are saying. I don't understand exactly > > > what you are proposing above in XENMAPSPACE_gmfn but I don't see how > > > anything about this would interfere with blkback. > > > > > > blkback when talking to an nvdimm will just go through the block layer > > > front door, and do a copy, I presume. > > > > I believe you are right. The block layer, and then the fs would copy in. > > > > > > I don't see how netback comes into it at all. > > > > > > But maybe I am just confused or ignorant! Please do explain :-). > > > > s/back/frontend/ > > > > My fear was refcounting. > > > > Specifically where we do not do copying. For example, you could > > be sending data from the NVDIMM GFNs (scp?) to some other location > > (another host?). It would go over the xen-netback (in the dom0) > > - which would then grant map it (dom0 would). > > > > Thanks for the explanation! > > It means NVDIMM is very possibly mapped in page granularity, and > hypervisor needs per-page data structures like page_info (rather than the > range set style nvdimm_pages) to manage those mappings. I do not know. I figured you need some accounting in the hypervisor as the pages can be grant mapped but I don't know the intricate details of the P2M code to tell you for certain. [edit: Your later email seems to imply that you do not need all this information? Just ranges?] > > Then we will face the problem that the potentially huge number of > per-page data structures may not fit in the normal ram. Linux kernel > developers came across the same problem, and their solution is to > reserve an area of NVDIMM and put the page structures in the reserved > area (https://lwn.net/Articles/672457/). I think we may take the similar > solution: > (1) Dom0 Linux kernel reserves an area on each NVDIMM for Xen usage > (besides the one used by Linux kernel itself) and reports the address > and size to Xen hypervisor. > > Reasons to choose Linux kernel to make the reservation include: > (a) only Dom0 Linux kernel has the NVDIMM driver, > (b) make it flexible for Dom0 Linux kernel to handle all > reservations (for itself and Xen). > > (2) Then Xen hypervisor builds the page structures for NVDIMM pages and > stores them in above reserved areas. > > (3) The reserved area is used as volatile, i.e. above two steps must be > done for every host boot. > > > In effect Xen there are two guests (dom0 and domU) pointing in the > > P2M to the same GPFN. And that would mean: > > > > > > > > (b) never map idx corresponding to GFNs occupied by vNVDIMM > > > > Granted the XENMAPSPACE_gmfn happens _before_ the grant mapping is done > > so perhaps this is not an issue? > > > > The other situation I was envisioning - where the driver domain has > > the NVDIMM passed in, and as well SR-IOV network card and functions > > as an iSCSI target. That should work OK as we just need the IOMMU > > to have the NVDIMM GPFNs programmed in. > > > > For this IOMMU usage example and above granted pages example, there > remains one question: who is responsible to perform NVDIMM flush > (clwb/clflushopt/pcommit)? > > For the granted page example, if a NVDIMM page is granted to > xen-netback, does the hypervisor need to tell xen-netback it's a NVDIMM > page so that xen-netback can perform proper flush when it writes to that > page? Or we may keep the NVDIMM transparent to xen-netback, and let Xen > perform the flush when xen-netback gives up the granted NVDIMM page? > > For the IOMMU example, my understanding is that there is a piece of > software in the driver domain that handles SCSI commands received from > network card and drives the network card to read/write certain areas of > NVDIMM. Then that software should be aware of the existence of NVDIMM > and perform the flush properly. Is that right? I would imagine it is the same as any write on NVDIMM. The "owner" of the NVDIMM would perform the pcommit. ? > > Haozhong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.