[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc v2] Add vNVDIMM support for Xen



Hey Haozhong,

On 07/18/2016 08:29 AM, Haozhong Zhang wrote:
> Hi,
> 
> Following is version 2 of the design doc for supporting vNVDIMM in

This version is really good, very clear and included almost everything I'd like 
to know.

> Xen. It's basically the summary of discussion on previous v1 design
> (https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00006.html).
> Any comments are welcome. The corresponding patches are WIP.
> 

So are you(or Intel) going to write all the patches? Is there any task the 
community to take part in?

[..snip..]
> 3. Usage Example of vNVDIMM in Xen
> 
>  Our design is to provide virtual pmem devices to HVM domains. The
>  virtual pmem devices are backed by host pmem devices.
> 
>  Dom0 Linux kernel can detect the host pmem devices and create
>  /dev/pmemXX for each detected devices. Users in Dom0 can then create
>  DAX file system on /dev/pmemXX and create several pre-allocate files
>  in the DAX file system.
> 
>  After setup the file system on the host pmem, users can add the
>  following lines in the xl configuration files to assign the host pmem
>  regions to domains:
>      vnvdimm = [ 'file=/dev/pmem0' ]
>  or
>      vnvdimm = [ 'file=/mnt/dax/pre_allocated_file' ]
> 

Could you please also consider the case when driver domain gets involved?
E.g vnvdimm = [ 'file=/dev/pmem0', backend='xxx' ]?

>   The first type of configuration assigns the entire pmem device
>   (/dev/pmem0) to the domain, while the second assigns the space
>   allocated to /mnt/dax/pre_allocated_file on the host pmem device to
>   the domain.
> 
..[snip..]
> 
> 4.2.2 Detection of Host pmem Devices
> 
>  The detection and initialize host pmem devices require a non-trivial
>  driver to interact with the corresponding ACPI namespace devices,
>  parse namespace labels and make necessary recovery actions. Instead
>  of duplicating the comprehensive Linux pmem driver in Xen hypervisor,
>  our designs leaves it to Dom0 Linux and let Dom0 Linux report
>  detected host pmem devices to Xen hypervisor.
> 
>  Our design takes following steps to detect host pmem devices when Xen
>  boots.
>  (1) As booting on bare metal, host pmem devices are detected by Dom0
>      Linux NVDIMM driver.
> 
>  (2) Our design extends Linux NVDIMM driver to reports SPA's and sizes
>      of the pmem devices and reserved areas to Xen hypervisor via a
>      new hypercall.
> 
>  (3) Xen hypervisor then checks
>      - whether SPA and size of the newly reported pmem device is overlap
>        with any previously reported pmem devices;
>      - whether the reserved area can fit in the pmem device and is
>        large enough to hold page_info structs for itself.
> 
>      If any checks fail, the reported pmem device will be ignored by
>      Xen hypervisor and hence will not be used by any
>      guests. Otherwise, Xen hypervisor will recorded the reported
>      parameters and create page_info structs in the reserved area.
> 
>  (4) Because the reserved area is now used by Xen hypervisor, it
>      should not be accessible by Dom0 any more. Therefore, if a host
>      pmem device is recorded by Xen hypervisor, Xen will unmap its
>      reserved area from Dom0. Our design also needs to extend Linux
>      NVDIMM driver to "balloon out" the reserved area after it
>      successfully reports a pmem device to Xen hypervisor.
> 
> 4.2.3 Get Host Machine Address (SPA) of Host pmem Files
> 
>  Before a pmem file is assigned to a domain, we need to know the host
>  SPA ranges that are allocated to this file. We do this work in xl.
> 
>  If a pmem device /dev/pmem0 is given, xl will read
>  /sys/block/pmem0/device/{resource,size} respectively for the start
>  SPA and size of the pmem device.
> 
>  If a pre-allocated file /mnt/dax/file is given,
>  (1) xl first finds the host pmem device where /mnt/dax/file is. Then
>      it uses the method above to get the start SPA of the host pmem
>      device.
>  (2) xl then uses fiemap ioctl to get the extend mappings of
>      /mnt/dax/file, and adds the corresponding physical offsets and
>      lengths in each mapping entries to above start SPA to get the SPA
>      ranges pre-allocated for this file.
> 

Looks like PMEM can't be passed through to driver domain directly like e.g PCI 
devices.

So if created a driver domain by: vnvdimm = [ 'file=/dev/pmem0' ], and make a 
DAX file system on the driver domain.

Then creating new guests with vnvdimm = [ 'file=dax file in driver domain', 
backend = 'driver domain' ].
Is this going to work? In my understanding, fiemap can only get the GPFN 
instead of the really SPA of PMEM in this case.


>  The resulting host SPA ranges will be passed to QEMU which allocates
>  guest address space for vNVDIMM devices and calls Xen hypervisor to
>  map the guest address to the host SPA ranges.
> 

Can Dom0 still access the same SPA range when Xen decides to assign it to new 
domU?
I assume the range will be unmapped automatically from dom0 in the hypercall?

Thanks,
-Bob

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.