[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC Design Doc v2] Add vNVDIMM support for Xen
On 08/02/16 08:46, Jan Beulich wrote: > >>> On 18.07.16 at 02:29, <haozhong.zhang@xxxxxxxxx> wrote: > > 4.2.2 Detection of Host pmem Devices > > > > The detection and initialize host pmem devices require a non-trivial > > driver to interact with the corresponding ACPI namespace devices, > > parse namespace labels and make necessary recovery actions. Instead > > of duplicating the comprehensive Linux pmem driver in Xen hypervisor, > > our designs leaves it to Dom0 Linux and let Dom0 Linux report > > detected host pmem devices to Xen hypervisor. > > > > Our design takes following steps to detect host pmem devices when Xen > > boots. > > (1) As booting on bare metal, host pmem devices are detected by Dom0 > > Linux NVDIMM driver. > > > > (2) Our design extends Linux NVDIMM driver to reports SPA's and sizes > > of the pmem devices and reserved areas to Xen hypervisor via a > > new hypercall. > > > > (3) Xen hypervisor then checks > > - whether SPA and size of the newly reported pmem device is overlap > > with any previously reported pmem devices; > > ... or with system RAM. > > > - whether the reserved area can fit in the pmem device and is > > large enough to hold page_info structs for itself. > > So "reserved" here means available for Xen's use, but not for more > general purposes? How would the area Linux uses for its own > purposes get represented? > Reserved for xen only. I was going to reuse the existing reservation mechanism in linux pmem driver to allow reserving two areas - one for xen and another for linux itself. However, I later realized the existing mechanism depends on huge page support, so it does not work in dom0. For the first implementation, I'm implementing in a different way to reserve only for xen, and letting dom0 linux put page struct for pmem in the normal ram. Afterwards, I'll look for a way to allow both. > > (4) Because the reserved area is now used by Xen hypervisor, it > > should not be accessible by Dom0 any more. Therefore, if a host > > pmem device is recorded by Xen hypervisor, Xen will unmap its > > reserved area from Dom0. Our design also needs to extend Linux > > NVDIMM driver to "balloon out" the reserved area after it > > successfully reports a pmem device to Xen hypervisor. > > ... "balloon out" ... _after_? That'd be unsafe. > Before ballooning is accomplished, the pmem driver does not create any device node under /dev/ and hence no one except the pmem drive can access the reserved area on pmem, so I think it's okey to balloon after reporting. > > 4.2.3 Get Host Machine Address (SPA) of Host pmem Files > > > > Before a pmem file is assigned to a domain, we need to know the host > > SPA ranges that are allocated to this file. We do this work in xl. > > > > If a pmem device /dev/pmem0 is given, xl will read > > /sys/block/pmem0/device/{resource,size} respectively for the start > > SPA and size of the pmem device. > > > > If a pre-allocated file /mnt/dax/file is given, > > (1) xl first finds the host pmem device where /mnt/dax/file is. Then > > it uses the method above to get the start SPA of the host pmem > > device. > > (2) xl then uses fiemap ioctl to get the extend mappings of > > /mnt/dax/file, and adds the corresponding physical offsets and > > lengths in each mapping entries to above start SPA to get the SPA > > ranges pre-allocated for this file. > > Remind me again: These extents never change, not even across > reboot? I think this would be good to be written down here explicitly. Yes > Hadn't there been talk of using labels to be able to allow a guest to > own the exact same physical range again after reboot or guest or > host? > You mean labels in NVDIMM label storage area? As defined in Intel NVDIMM Namespace Specification, labels are used to specify namespaces. For a pmem interleave set (possible cross several dimms), at most one pmem namespace (and hence at most one label) is allowed. Therefore, labels can not be used to partition pmem. > > 3) When hvmloader loads a type 0 entry, it extracts the signature > > from the data blob and search for it in builtin_table_sigs[]. If > > found anyone, hvmloader will report an error and stop. Otherwise, > > it will append it to the end of loaded guest ACPI. > > Duplicate table names aren't generally collisions: There can, for > example, be many tables named "SSDT". > I'll exclude SSDT from the duplication check. > > 4) When hvmloader loads a type 1 entry, it extracts the device name > > from the data blob and search for it in builtin_nd_names[]. If > > found anyone, hvmloader will report and error and stop. Otherwise, > > it will wrap the AML code snippet by "Device (name[4]) {...}" and > > include it in a new SSDT which is then appended to the end of > > loaded guest ACPI. > > But all of these could go into a single SSDT, instead of (as it sounds) > each into its own one? > Yes, I meant to put them in one SSDT. Thanks, Haozhong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |