[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc v2] Add vNVDIMM support for Xen



On 08/03/16 02:45, Jan Beulich wrote:
> >>> On 03.08.16 at 08:54, <haozhong.zhang@xxxxxxxxx> wrote:
> > On 08/02/16 08:46, Jan Beulich wrote:
> >> >>> On 18.07.16 at 02:29, <haozhong.zhang@xxxxxxxxx> wrote:
> >> >  (4) Because the reserved area is now used by Xen hypervisor, it
> >> >      should not be accessible by Dom0 any more. Therefore, if a host
> >> >      pmem device is recorded by Xen hypervisor, Xen will unmap its
> >> >      reserved area from Dom0. Our design also needs to extend Linux
> >> >      NVDIMM driver to "balloon out" the reserved area after it
> >> >      successfully reports a pmem device to Xen hypervisor.
> >> 
> >> ... "balloon out" ... _after_? That'd be unsafe.
> >>
> > 
> > Before ballooning is accomplished, the pmem driver does not create any
> > device node under /dev/ and hence no one except the pmem drive can
> > access the reserved area on pmem, so I think it's okey to balloon
> > after reporting.
> 
> Right now Dom0 isn't allowed to access any memory in use by Xen
> (and not explicitly shared), and I don't think we should deviate
> from that model for pmem.
>

In this design, Xen hypervisor unmaps the reserved area from Dom0 so
that Dom0 cannot access the reserved area afterwards. And "balloon" is
in fact not a memory ballooning, because Linux kernel never allocates
from pmem like normal ram. In my current implementation, it's just to
remove the reserved area from a resource struct covering pmem.

> >> > 4.2.3 Get Host Machine Address (SPA) of Host pmem Files
> >> > 
> >> >  Before a pmem file is assigned to a domain, we need to know the host
> >> >  SPA ranges that are allocated to this file. We do this work in xl.
> >> > 
> >> >  If a pmem device /dev/pmem0 is given, xl will read
> >> >  /sys/block/pmem0/device/{resource,size} respectively for the start
> >> >  SPA and size of the pmem device.
> >> > 
> >> >  If a pre-allocated file /mnt/dax/file is given,
> >> >  (1) xl first finds the host pmem device where /mnt/dax/file is. Then
> >> >      it uses the method above to get the start SPA of the host pmem
> >> >      device.
> >> >  (2) xl then uses fiemap ioctl to get the extend mappings of
> >> >      /mnt/dax/file, and adds the corresponding physical offsets and
> >> >      lengths in each mapping entries to above start SPA to get the SPA
> >> >      ranges pre-allocated for this file.
> >> 
> >> Remind me again: These extents never change, not even across
> >> reboot? I think this would be good to be written down here explicitly.
> > 
> > Yes
> > 
> >> Hadn't there been talk of using labels to be able to allow a guest to
> >> own the exact same physical range again after reboot or guest or
> >> host?
> > 
> > You mean labels in NVDIMM label storage area? As defined in Intel
> > NVDIMM Namespace Specification, labels are used to specify
> > namespaces. For a pmem interleave set (possible cross several dimms),
> > at most one pmem namespace (and hence at most one label) is
> > allowed. Therefore, labels can not be used to partition pmem.
> 
> Okay. But then how do particular ranges get associated with the
> owning guest(s)? Merely by SPA would seem rather fragile to me.
> 

By using the file name, e.g. if I specify vnvdimm = [ 'file=/mnt/dax/foo' ]
in a domain config file, SPA occupied by /mnt/dax/foo are mapped to
the domain.  If the same file is used every time the domain is created,
the same virtual device will be seen by that domain.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.