[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC XEN PATCH v2 00/15] Add vNVDIMM support to HVM domains



On Tue, Apr 4, 2017 at 10:34 AM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
> On Tue, Apr 04, 2017 at 10:16:41AM -0700, Dan Williams wrote:
>> On Tue, Apr 4, 2017 at 10:00 AM, Konrad Rzeszutek Wilk
>> <konrad.wilk@xxxxxxxxxx> wrote:
>> > On Sat, Apr 01, 2017 at 08:45:45AM -0700, Dan Williams wrote:
>> >> On Sat, Apr 1, 2017 at 4:54 AM, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> 
>> >> wrote:
>> >> > ..snip..
>> >> >> >> Is there a resource I can read more about why the hypervisor needs 
>> >> >> >> to
>> >> >> >> have this M2P mapping for nvdimm support?
>> >> >> >
>> >> >> > M2P is basically an array of frame numbers. It's indexed by the host
>> >> >> > page frame number, or the machine frame number (MFN) in Xen's
>> >> >> > definition. The n'th entry records the guest page frame number that 
>> >> >> > is
>> >> >> > mapped to MFN n. M2P is one of the core data structures used in Xen
>> >> >> > memory management, and is used to convert MFN to guest PFN. A
>> >> >> > read-only version of M2P is also exposed as part of ABI to guest. In
>> >> >> > the previous design discussion, we decided to put the management of
>> >> >> > NVDIMM in the existing Xen memory management as much as possible, so
>> >> >> > we need to build M2P for NVDIMM as well.
>> >> >> >
>> >> >>
>> >> >> Thanks, but what I don't understand is why this M2P lookup is needed?
>> >> >
>> >> > Xen uses it to construct the EPT page tables for the guests.
>> >> >
>> >> >> Does Xen establish this metadata for PCI mmio ranges as well? What Xen
>> >> >
>> >> > It doesn't have that (M2P) for PCI MMIO ranges. For those it has an
>> >> > ranges construct (since those are usually contingous and given
>> >> > in ranges to a guest).
>> >>
>> >> So, I'm confused again. This patchset / enabling requires both M2P and
>> >> contiguous PMEM ranges. If the PMEM is contiguous it seems you don't
>> >> need M2P and can just reuse the MMIO enabling, or am I missing
>> >> something?
>> >
>> > I think I am confusing you.
>> >
>> > The patchset (specifically 04/15] xen/x86: add 
>> > XEN_SYSCTL_nvdimm_pmem_setup to setup host pmem )
>> > adds a hypercall to tell Xen where on the NVDIMM it can put
>> > the M2P array and as well the frametables ('struct page').
>> >
>> > There is no range support. The reason is that if break up
>> > an NVDIMM in various chunks (and then put a filesystem on top of it) - and
>> > then figure out which of the SPAs belong to the file - and then
>> > "expose" that file to a guest as NVDIMM - it's SPAs won't
>> > be contingous. Hence the hypervisor would need to break down
>> > the 'ranges' structure down in either a bitmap or an M2P
>> > and also store it. This can get quite tricky so you may
>> > as well just start with an M2P and 'struct page'.
>>
>> Ok... but the problem then becomes that the filesystem is free to
>> change the file-offset to SPA mapping any time it wants. So the M2P
>> support is broken if it expects static relationships.
>
> Can't you flock an file and that will freeze it? Or mlock it since
> one is rather mmap-ing it?

Unfortunately no. This dovetails with the discussion we have been
having with filesystem folks about the need to call msync() after
every write. Whenever the filesystem sees a write fault it is free to
move blocks around in the file, think allocation or copy-on-write
operations like reflink. The filesystem depends on the application
calling msync/fsync before it makes the writes from those faults
durable against crash / powerloss.  Also, actions like online defrag
can change these offset to physical address relationships without any
involvement from the application. There's currently no mechanism to
lock out this behavior because the filesystem assumes that it can just
invalidate mappings to make the application re-fault.

>>
>> > The placement of those datastructures is "
>> > v2 patch
>> >    series relies on users/admins in Dom0 instead of Dom0 driver to 
>> > indicate the
>> >    location to store the frametable and M2P of pmem.
>> > "
>> >
>> > Hope this helps?
>>
>> It does, but it still seems we're stuck between either 1/ not needing
>> M2P if we can pass a whole pmem-namespace through to the guest or 2/
>> M2P being broken by non-static file-offset to physical address
>> mappings.
>
> Aye. So how can 2/ be fixed? I am assuming you would have the same
> issue with KVM - if the file is 'moving' underneath (and the file-offset
> to SPA has changed) won't that affect the EPT and other page entries?

I don't think KVM has the same issue, but honestly I don't have the
full mental model of how KVM supports mmap. I've at least been able to
run a guest where the "pmem" is just dynamic page cache on the host
side so the physical memory mapping is changing all the time due to
swap. KVM does not have this third-party M2P mapping table to keep up
to date so I assume it is just handled by the standard mmap support
for establishing a guest physical address range and the standard
mapping-invalidate + remap mechanism just works.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.