[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On Wed, Mar 02, 2016 at 03:14:52PM +0800, Haozhong Zhang wrote:
> On 03/01/16 13:49, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 01, 2016 at 06:33:32PM +0000, Ian Jackson wrote:
> > > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM 
> > > support for Xen"):
> > > > On 02/18/16 21:14, Konrad Rzeszutek Wilk wrote:
> > > > > [someone:]
> > > > > > (2) For XENMAPSPACE_gmfn, _gmfn_range and _gmfn_foreign,
> > > > > >    (a) never map idx in them to GFNs occupied by vNVDIMM, and
> > > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > > > > 
> > > > > Would that mean that guest xen-blkback or xen-netback wouldn't
> > > > > be able to fetch data from the GFNs? As in, what if the HVM guest
> > > > > that has the NVDIMM also serves as a device domain - that is it
> > > > > has xen-blkback running to service other guests?
> > > > 
> > > > I'm not familiar with xen-blkback and xen-netback, so following
> > > > statements maybe wrong.
> > > > 
> > > > In my understanding, xen-blkback/-netback in a device domain maps the
> > > > pages from other domains into its own domain, and copies data between
> > > > those pages and vNVDIMM. The access to vNVDIMM is performed by NVDIMM
> > > > driver in device domain. In which steps of this procedure that
> > > > xen-blkback/-netback needs to map into GFNs of vNVDIMM?
> > > 
> > > I think I agree with what you are saying.  I don't understand exactly
> > > what you are proposing above in XENMAPSPACE_gmfn but I don't see how
> > > anything about this would interfere with blkback.
> > > 
> > > blkback when talking to an nvdimm will just go through the block layer
> > > front door, and do a copy, I presume.
> > 
> > I believe you are right. The block layer, and then the fs would copy in.
> > > 
> > > I don't see how netback comes into it at all.
> > > 
> > > But maybe I am just confused or ignorant!  Please do explain :-).
> > 
> > s/back/frontend/  
> > 
> > My fear was refcounting.
> > 
> > Specifically where we do not do copying. For example, you could
> > be sending data from the NVDIMM GFNs (scp?) to some other location
> > (another host?). It would go over the xen-netback (in the dom0)
> > - which would then grant map it (dom0 would).
> >
> 
> Thanks for the explanation!
> 
> It means NVDIMM is very possibly mapped in page granularity, and
> hypervisor needs per-page data structures like page_info (rather than the
> range set style nvdimm_pages) to manage those mappings.

I do not know. I figured you need some accounting in the hypervisor
as the pages can be grant mapped but I don't know the intricate details
of the P2M code to tell you for certain.

[edit: Your later email seems to imply that you do not need all this
information? Just ranges?]
> 
> Then we will face the problem that the potentially huge number of
> per-page data structures may not fit in the normal ram. Linux kernel
> developers came across the same problem, and their solution is to
> reserve an area of NVDIMM and put the page structures in the reserved
> area (https://lwn.net/Articles/672457/). I think we may take the similar
> solution:
> (1) Dom0 Linux kernel reserves an area on each NVDIMM for Xen usage
>     (besides the one used by Linux kernel itself) and reports the address
>     and size to Xen hypervisor.
> 
>     Reasons to choose Linux kernel to make the reservation include:
>     (a) only Dom0 Linux kernel has the NVDIMM driver,
>     (b) make it flexible for Dom0 Linux kernel to handle all
>         reservations (for itself and Xen).
> 
> (2) Then Xen hypervisor builds the page structures for NVDIMM pages and
>     stores them in above reserved areas.
> 
> (3) The reserved area is used as volatile, i.e. above two steps must be
>     done for every host boot.
> 
> > In effect Xen there are two guests (dom0 and domU) pointing in the
> > P2M to the same GPFN. And that would mean:
> > 
> > > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > 
> > Granted the XENMAPSPACE_gmfn happens _before_ the grant mapping is done
> > so perhaps this is not an issue?
> > 
> > The other situation I was envisioning - where the driver domain has
> > the NVDIMM passed in, and as well SR-IOV network card and functions
> > as an iSCSI target. That should work OK as we just need the IOMMU
> > to have the NVDIMM GPFNs programmed in.
> >
> 
> For this IOMMU usage example and above granted pages example, there
> remains one question: who is responsible to perform NVDIMM flush
> (clwb/clflushopt/pcommit)?


> 
> For the granted page example, if a NVDIMM page is granted to
> xen-netback, does the hypervisor need to tell xen-netback it's a NVDIMM
> page so that xen-netback can perform proper flush when it writes to that
> page? Or we may keep the NVDIMM transparent to xen-netback, and let Xen
> perform the flush when xen-netback gives up the granted NVDIMM page?
> 
> For the IOMMU example, my understanding is that there is a piece of
> software in the driver domain that handles SCSI commands received from
> network card and drives the network card to read/write certain areas of
> NVDIMM. Then that software should be aware of the existence of NVDIMM
> and perform the flush properly. Is that right?

I would imagine it is the same as any write on NVDIMM. The "owner"
of the NVDIMM would perform the pcommit. ?
> 
> Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.