[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 02/16/16 05:55, Jan Beulich wrote:
> >>> On 16.02.16 at 12:14, <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Mon, 15 Feb 2016, Zhang, Haozhong wrote:
> >> On 02/04/16 20:24, Stefano Stabellini wrote:
> >> > On Thu, 4 Feb 2016, Haozhong Zhang wrote:
> >> > > On 02/03/16 15:22, Stefano Stabellini wrote:
> >> > > > On Wed, 3 Feb 2016, George Dunlap wrote:
> >> > > > > On 03/02/16 12:02, Stefano Stabellini wrote:
> >> > > > > > On Wed, 3 Feb 2016, Haozhong Zhang wrote:
> >> > > > > >> Or, we can make a file system on /dev/pmem0, create files on 
> >> > > > > >> it, set
> >> > > > > >> the owner of those files to xen-qemuuser-domid$domid, and then 
> >> > > > > >> pass
> >> > > > > >> those files to QEMU. In this way, non-root QEMU should be able 
> >> > > > > >> to
> >> > > > > >> mmap those files.
> >> > > > > >
> >> > > > > > Maybe that would work. Worth adding it to the design, I would 
> >> > > > > > like to
> >> > > > > > read more details on it.
> >> > > > > >
> >> > > > > > Also note that QEMU initially runs as root but drops privileges 
> >> > > > > > to
> >> > > > > > xen-qemuuser-domid$domid before the guest is started. Initially 
> >> > > > > > QEMU
> >> > > > > > *could* mmap /dev/pmem0 while is still running as root, but then 
> >> > > > > > it
> >> > > > > > wouldn't work for any devices that need to be mmap'ed at run time
> >> > > > > > (hotplug scenario).
> >> > > > >
> >> > > > > This is basically the same problem we have for a bunch of other 
> >> > > > > things,
> >> > > > > right?  Having xl open a file and then pass it via qmp to qemu 
> >> > > > > should
> >> > > > > work in theory, right?
> >> > > >
> >> > > > Is there one /dev/pmem? per assignable region?
> >> > > 
> >> > > Yes.
> >> > > 
> >> > > BTW, I'm wondering whether and how non-root qemu works with xl disk
> >> > > configuration that is going to access a host block device, e.g.
> >> > >      disk = [ '/dev/sdb,,hda' ]
> >> > > If that works with non-root qemu, I may take the similar solution for
> >> > > pmem.
> >> >  
> >> > Today the user is required to give the correct ownership and access mode
> >> > to the block device, so that non-root QEMU can open it. However in the
> >> > case of PCI passthrough, QEMU needs to mmap /dev/mem, as a consequence
> >> > the feature doesn't work at all with non-root QEMU
> >> > (http://marc.info/?l=xen-devel&m=145261763600528).
> >> > 
> >> > If there is one /dev/pmem device per assignable region, then it would be
> >> > conceivable to change its ownership so that non-root QEMU can open it.
> >> > Or, better, the file descriptor could be passed by the toolstack via
> >> > qmp.
> >> 
> >> Passing file descriptor via qmp is not enough.
> >> 
> >> Let me clarify where the requirement for root/privileged permissions
> >> comes from. The primary workflow in my design that maps a host pmem
> >> region or files in host pmem region to guest is shown as below:
> >>  (1) QEMU in Dom0 mmap the host pmem (the host /dev/pmem0 or files on
> >>      /dev/pmem0) to its virtual address space, i.e. the guest virtual
> >>      address space.
> >>  (2) QEMU asks Xen hypervisor to map the host physical address, i.e. SPA
> >>      occupied by the host pmem to a DomU. This step requires the
> >>      translation from the guest virtual address (where the host pmem is
> >>      mmaped in (1)) to the host physical address. The translation can be
> >>      done by either
> >>     (a) QEMU that parses its own /proc/self/pagemap,
> >>      or
> >>     (b) Xen hypervisor that does the translation by itself [1] (though
> >>         this choice is not quite doable from Konrad's comments [2]).
> >> 
> >> [1] 
> >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00434.html 
> >> [2] 
> >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00606.html 
> >> 
> >> For 2-a, reading /proc/self/pagemap requires CAP_SYS_ADMIN capability
> >> since linux kernel 4.0. Furthermore, if we don't mlock the mapped host
> >> pmem (by adding MAP_LOCKED flag to mmap or calling mlock after mmap),
> >> pagemap will not contain all mappings. However, mlock may require
> >> privileged permission to lock memory larger than RLIMIT_MEMLOCK. Because
> >> mlock operates on memory, the permission to open(2) the host pmem files
> >> does not solve the problem and therefore passing file descriptor via qmp
> >> does not help.
> >> 
> >> For 2-b, from Konrad's comments [2], mlock is also required and
> >> privileged permission may be required consequently.
> >> 
> >> Note that the mapping and the address translation are done before QEMU
> >> dropping privileged permissions, so non-root QEMU should be able to work
> >> with above design until we start considering vNVDIMM hotplug (which has
> >> not been supported by the current vNVDIMM implementation in QEMU). In
> >> the hotplug case, we may let Xen pass explicit flags to QEMU to keep it
> >> running with root permissions.
> > 
> > Are we all good with the fact that vNVDIMM hotplug won't work (unless
> > the user explicitly asks for it at domain creation time, which is
> > very unlikely otherwise she could use coldplug)?
> 
> No, at least there needs to be a road towards hotplug, even if
> initially this may not be supported/implemented.
> 

Suddenly realize it's unnecessary to let QEMU get SPA ranges of NVDIMM
or files on NVDIMM. We can move that work to toolstack and pass SPA
ranges got by toolstack to qemu. In this way, no privileged operations
(mmap/mlock/...) are needed in QEMU and non-root QEMU should be able to
work even with vNVDIMM hotplug in future.

Haozhong



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.