[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PV IOMMU interface draft C



+ARM devs.

On Fri, 2015-06-26 at 11:23 +0100, Malcolm Crossley wrote:
> Hi All,

I had a chat with Malcolm about this with respect to ARM.

The upshot is that this does not help us to remove the dom0 1:1
workaround or associated swiotlb uses on systems without an SMMU, nor
does it allow us to sensibly do passthrough on systems which lack an
SMMU.

What it will be good for is in the future when doing "mediated
passthrough", that is the xengt like thing where the device is partly
assigned to the guest and partly emulated in a privileged domain.

I had a look through the previous draft earlier in the week and didn't
notice anything which would preclude use on ARM in the future.

Ian.

> 
> Here is a design for allowing guests to control the IOMMU. This
> allows for the guest GFN mapping to be programmed into the IOMMU and
> avoid using the SWIOTLB bounce buffer technique in the Linux kernel
> (except for legacy 32 bit DMA IO devices).
> 
> Draft C has been reordered to explain expected behaviours before the APIs
> themselves. There's an additional section to explain the rationale for
> separate subops from local GFN mappings and foreign GFN mappings.
> 
> There's also further detail on the Linux API for foreign BFN mappings.
> 
> The plan is to start writing code against this version so please provide
> feedback on any major design problems/concerns.
> 
> The pandoc markdown format of the document is provided below to allow
> for easier inline comments:
> 
> % Xen PV IOMMU interface
> % Malcolm Crossley <<malcolm.crossley@xxxxxxxxxx>>
>   Paul Durrant <<paul.durrant@xxxxxxxxxx>>
> % Draft C
> 
> Introduction
> ============
> 
> Revision History
> ----------------
> 
> --------------------------------------------------------------------
> Version  Date         Changes
> -------  -----------  ----------------------------------------------
> Draft A  10 Apr 2014  Initial draft.
> 
> Draft B  12 Jun 2015  Second draft.
> 
> Draft C  26 Jun 2015  Third draft.
> --------------------------------------------------------------------
> 
> Background
> ==========
> 
> Linux kernel SWIOTLB
> --------------------
> 
> Xen PV guests use a Pseudophysical Frame Number(PFN) address space which is
> decoupled from the host Machine Frame Number(MFN) address space.
> 
> PV guest hardware drivers are aware of the PFN address space only and
> assume that if PFN addresses are contiguous then the hardware addresses would
> be contiguous as well. The decoupling between PFN and MFN address spaces means
> PFN and MFN addresses may not be contiguous across page boundaries and thus a
> buffer allocated in GFN address space which spans a page boundary may not be
> contiguous in MFN address space.
> 
> PV hardware drivers cannot tolerate this behaviour and so a special
> "bounce buffer" region is used to hide this issue from the drivers.
> 
> A bounce buffer region is a special part of the PFN address space which has
> been made to be contiguous in both PFN and MFN address spaces. When a driver
> requests a buffer which spans a page boundary be made available for hardware
> to read the core operating system code copies the buffer into a temporarily
> reserved part of the bounce buffer region and then returns the MFN address of
> the reserved part of the bounce buffer region back to the driver itself. The
> driver then instructs the hardware to read the copy of the buffer in the
> bounce buffer. Similarly if the driver requests a buffer is made available
> for hardware to write to the first a region of the bounce buffer is reserved
> and then after the hardware completes writing then the reserved region of
> bounce buffer is copied to the originally allocated buffer.
> 
> The overheard of memory copies to/from the bounce buffer region is high
> and damages performance. Furthermore, there is a risk the fixed size
> bounce buffer region will become exhausted and it will not be possible to
> return an hardware address back to the driver. The Linux kernel drivers do not
> tolerate this failure and so the kernel is forced to crash, as an
> unrecoverable error has occurred.
> 
> Input/Output Memory Management Units (IOMMU) allow for an inbound address
> mapping to be created from the I/O Bus address space (typically PCI) to
> the machine frame number address space. IOMMUs typically use a page table
> mechanism to manage the mappings and therefore can create mappings of page 
> size
> granularity or larger.
> 
> The I/O Bus address space will be referred to as the Bus Frame Number (BFN)
> address space for the rest of this document.
> 
> 
> Mediated Pass-through Emulators
> -------------------------------
> 
> Mediated Pass-through emulators allow guest domains to interact with
> hardware devices via emulator mediation. The emulator runs in a domain 
> separate
> to the guest domain and it is used to enforce security of guest access to the
> hardware devices and isolation of different guests accessing the same hardware
> device.
> 
> The emulator requires a mechanism to map guest addresses to a bus address that
> the hardware devices can access.
> 
> 
> Clarification of GFN and BFN fields for different guest types
> -------------------------------------------------------------
> Guest Frame Numbers (GFN) definition varies depending on the guest type.
> 
> Diagram below details the memory accesses originating from CPU, per guest 
> type:
> 
>       HVM guest                              PV guest
> 
>          (VA)                                   (VA)
>           |                                      |
>          MMU                                    MMU
>           |                                      |
>          (GFN)                                   |
>           |                                      | (GFN)
>      HAP a.k.a EPT/NPT                           |
>           |                                      |
>          (MFN)                                  (MFN)
>           |                                      |
>          RAM                                    RAM
> 
> For PV guests GFN is equal to MFN for a single page but not for a contiguous
> range of pages.
> 
> Bus Frame Numbers (BFN) refer to the address presented on the physical bus
> before being translated by the IOMMU.
> 
> Diagram below details memory accesses originating from physical device.
> 
>     Physical Device
>           |
>         (BFN)
>           |
>          IOMMU-PT
>           |
>         (MFN)
>           |
>          RAM
> 
> 
> 
> Purpose
> =======
> 
> 1. Allow Xen guests to create/modify/destroy IOMMU mappings for
> hardware devices that the PV guests has access to. This enables the PV guest 
> to
> program a bus address space mapping which matches its GFN mapping. Once a 1-1
> mapping of PFN to bus address space is created then a bounce buffer
> region is not required for the I/O devices connected to the IOMMU.
> 
> 2. Allow for Xen guests to lookup/create/modify/destroy IOMMU mappings for
> guest memory of domains the calling Xen guest has sufficient privilege over.
> This enables domains to provide mediated hardware acceleration to other
> guest domains.
> 
> 
> General principles for PV IOMMU interface
> =========================================
> 
> There are two different usage models for the BFN address space of a calling
> guest based upon the two purposes specified in the section above.
> 
> A calling guest may use their BFN address space for only one of the purposes
> detailed above and so the PV IOMMU interface has a subop per usage model.
> Furthermore, the IOMMU mapping of foreign domains memory is more complex than
> IOMMU mapping local domain memory and seperating the subops allows for the
> complexity to be split in the implementation.
> 
> The PV IOMMU design allows the calling domain to control it's BFN memory map.
> Thus the design also assigns the responsiblity of ensuring a BFN address
> mapped for local domain memory mappings are not reused for foreign domain
> memory mappings without an explict unmap of BFN address first. This simplifies
> the usage of the API and the extra overhead for the calling domains should be
> minimal as they should be already tracking the BFN address space usage 
> already.
> 
> 
> Emulator usage of PV IOMMU interface
> ====================================
> 
> Emulators which require bus address mapping of guest RAM must first determine 
> if
> it's possible for the domain to control the bus addresses themselves.
> 
> A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
> flag is set then the emulator may specify the BFN address it wishes guest RAM 
> to
> be mapped to via the IOMMUOP_map_foreign_page subop.  If the flag is not set
> then the emulator must use BFN addresses supplied by the Xen via the
> IOMMUOP_lookup_foreign_page.
> 
> Operating systems which use the IOMMUOP_map_page subop are expected to 
> provide a
> common interface for emulators to use. Otherwise emulators will not be aware
> of existing BFN mappings created by operating system and will get failed
> subops due to conflicts in the BFN address space for the domain.
> 
> Emulators should unmap unused GFN mappings as often as possible using
> IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages
> quickly and efficiently.
> 
> Emulators should conform to the ballooning behaviour described section
> "IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that 
> guest
> domains are able to effectively balloon out and in memory.
> 
> Emulators must unmap any active BFN mappings when they shutdown.
> 
> IOMMUOP_*_foreign_page interactions with guest domain ballooning
> ================================================================
> 
> Guest domains can balloon out a set of GFN mappings at any time and render the
> BFN to GFN mapping invalid.
> 
> When a BFN to GFN mapping becomes invalid, Xen will issue a buffered I/O 
> request
> of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now 
> invalid
> BFN address in the data field. If the buffered I/O request ring is full then a
> standard (synchronous) I/O request of type IOREQ_TYPE_INVALIDATE will be 
> issued
> to the affected IOREQ server the with just invalidated BFN address in the data
> field.
> 
> The BFN mappings cannot be simply unmapped at the point of the balloon 
> hypercall
> otherwise a malicious guest could specifically balloon out an in use GFN 
> address
> in use by an emulator and trigger IOMMU faults for the domains with BFN
> mappings.
> 
> For hosts with no IOMMU support: The affected emulator(s) must specifically
> issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so 
> that
> the references to the underlying MFN are removed and the MFN can be freed back
> to the Xen memory allocator.
> 
> For hosts with IOMMU support:
> If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page then the affected affected emulator(s) must
> specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN
> address so that the references to the underlying MFN are removed.
> 
> If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN 
> then
> the BFN mapping will be swapped to point at a scratch MFN page and all BFN
> references to the invalid MFN will be removed by Xen after the BFN mapping has
> been updated to point at the scratch MFN page.
> 
> The rationale for swapping the BFN mapping to point at scratch pages is to
> enable guest domains to balloon quickly without requiring hypercall(s) from
> emulators.
> 
> Not all BFN mappings can be swapped without potentially causing problems for 
> the
> hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to
> allow per BFN control of Xen ballooning behaviour.
> 
> 
> PV IOMMU interactions with self ballooning
> ==========================================
> 
> A guest should clear any IOMMU mappings it has of its own pages before
> releasing a page back to Xen. The guest also will need to add IOMMU mappings
> after repopulating a page with the populate_physmap hypercall.
> 
> PV guests must clear any IOMMU mappings before pinning page table pages
> because the IOMMU mappings will take a writable reference count and this will
> prevent page table pinning.
> 
> 
> Security Implications of allowing domain IOMMU control
> ======================================================
> 
> Xen currently allows I/O devices attached to hardware domain to have direct
> access to the all of the MFN address space (except Xen hypervisor memory 
> regions),
> provided the Xen IOMMU option dom0-strict is not enabled.
> 
> The PV IOMMU feature provides the same level of access to MFN address space
> and the feature is not enabled when the Xen IOMMU option dom0-strict is
> enabled. Therefore security is not degraded by the PV IOMMU feature.
> 
> Domains with physical device(s) assigned which are not hardware domains are 
> only
> allowed to map their own GFNs or GFNs for domain(s) they have privilege over.
> 
> 
> PV IOMMU interactions with grant map/unmap operations
> =====================================================
> 
> Grant map operations return a Physical device accessible address (BFN) if the
> GNTMAP_device_map flag is set.  This operation currently returns the MFN for 
> PV
> guests which may conflict with the BFN address space the guest uses if PV 
> IOMMU
> map support is available to the guest.
> 
> This design proposes to allow the calling domain to control the BFN address 
> that
> a grant map operation uses.
> 
> This can be achieved by specifying that the dev_bus_addr in the
> gnttab_map_grant_ref structure is used an input parameter instead of the
> output parameter it is currently.
> 
> Only PAGE_SIZE aligned addresses are allowed for dev_bus_addr input parameter.
> 
> The revised structure is shown below for convenience.
> 
>     struct gnttab_map_grant_ref {
>         /* IN parameters. */
>         uint64_t host_addr;
>         uint32_t flags;               /* GNTMAP_* */
>         grant_ref_t ref;
>         domid_t  dom;
>         /* OUT parameters. */
>         int16_t  status;              /* => enum grant_status */
>         grant_handle_t handle;
>         /* IN/OUT parameters */
>         uint64_t dev_bus_addr;
>     };
> 
> 
> The grant map operation would then behave similarly to the IOMMUOP_map_page
> subop for the creation of the IOMMU mapping.
> 
> The grant unmap operation would then behave similarly to the 
> IOMMUOP_unmap_page
> subop for the removal of the IOMMU mapping.
> 
> A new grantmap flag would be used to indicate the domain is requesting the
> dev_bus_addr field is used an input parameter.
> 
> 
>     #define _GNTMAP_request_bfn_map      (6)
>     #define GNTMAP_request_bfn_map   (1<<_GNTMAP_request_bfn_map)
> 
> 
> Xen PV-IOMMU Architecture
> =========================
> 
> The Xen architecture consists of a new hypercall interface and changes to the
> grant map interface.
> 
> The existing IOMMU mappings setup at domain creation time will be preserved so
> that PV domains unaware of this feature will continue to function with no
> changes required.
> 
> Memory ballooning will be supported by taking an additional reference on the
> MFN backing the GFN for each successful IOMMU mapping created.
> 
> An M2B tracking structure will be used to ensure all references to an MFN can
> be located efficiently.
> 
> Xen PV IOMMU hypercall interface
> --------------------------------
> A two argument hypercall interface (do_iommu_op).
> 
>     ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count)
> 
> First argument, guest handle pointer to array of `struct pv_iommu_op`
> 
> Second argument, unsigned integer count of `struct pv_iommu_op` elements in 
> array.
> 
> Definition of `struct pv_iommu_op`:
> 
>     struct pv_iommu_op {
> 
>         uint16_t subop_id;
>         uint16_t flags;
>         int32_t status;
> 
>         union {
>             struct {
>                 uint64_t bfn;
>                 uint64_t gfn;
>             } map_page;
> 
>             struct {
>                 uint64_t bfn;
>             } unmap_page;
> 
>             struct {
>                 uint64_t bfn;
>                 uint64_t gfn;
>                 uint16_t domid;
>                 ioservid_t ioserver;
>             } map_foreign_page;
> 
>             struct {
>                 uint64_t bfn;
>                 uint64_t gfn;
>                 uint16_t domid;
>                 ioservid_t ioserver;
>             } lookup_foreign_page;
> 
>             struct {
>                 uint64_t bfn;
>                 ioservid_t ioserver;
>             } unmap_foreign_page;
>         } u;
>     };
> 
> Definition of PV IOMMU subops:
> 
>     #define IOMMUOP_query_caps            1
>     #define IOMMUOP_map_page              2
>     #define IOMMUOP_unmap_page            3
>     #define IOMMUOP_map_foreign_page      4
>     #define IOMMUOP_lookup_foreign_page   5
>     #define IOMMUOP_unmap_foreign_page    6
> 
> 
> Design considerations for hypercall op
> -------------------------------------------
> IOMMU map/unmap operations can be slow and can involve flushing the IOMMU TLB
> to ensure the I/O device uses the updated mappings.
> 
> The op has been designed to take an array of operations and a count as
> parameters. This allows for easily implemented hypercall continuations to be
> used and allows for batches of IOMMU operations to be submitted before 
> flushing
> the IOMMU TLB.
> 
> The `subop_id` to be used for a particular element is encoded into the element
> itself. This allows for map and unmap operations to be performed in one 
> hypercall
> and for the IOMMU TLB flushing optimisations to be still applied.
> 
> The hypercall will ensure that the required IOMMU TLB flushes are applied 
> before
> returning to guest via either hypercall completion or a hypercall 
> continuation.
> 
> IOMMUOP_query_caps
> ------------------
> 
> This subop queries the runtime capabilities of the PV-IOMMU interface for the
> specific calling domain. This subop uses `struct pv_iommu_op` directly.
> 
> ------------------------------------------------------------------------------
> Field          Purpose
> -----          ---------------------------------------------------------------
> `flags`        [out] This field details the IOMMUOP capabilities.
> 
> `status`       [out] Status of this op, op specific values listed below
> ------------------------------------------------------------------------------
> 
> Defined bits for flags field:
> 
> ------------------------------------------------------------------------------
> Name                        Bit                Definition
> ----                       ------     ----------------------------------
> IOMMU_QUERY_map_cap          0        IOMMUOP_map_page or IOMMUOP_map_foreign
>                                       can be used for the calling domain
> 
> IOMMU_QUERY_map_all_mfns     1        IOMMUOP_map_page subop can map any MFN
>                                       not used by Xen
> 
> Reserved for future use     2-9                   n/a
> 
> IOMMU_page_order           10-15      Returns maximum possible page order for
>                                       all other IOMMUOP subops
> ------------------------------------------------------------------------------
> 
> Defined values for query_caps subop status field:
> 
> Value   Reason
> ------  ----------------------------------------------------------
> 0       subop successfully returned
> 
> IOMMUOP_map_page
> ----------------------
> This subop uses `struct map_page` part of the `struct pv_iommu_op`.
> 
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFNs except for Xen owned MFNs else the hardware
> domain will only be allowed to map GFNs which it owns.
> 
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFNs without taking a reference to the MFN backing the GFN
> by setting the IOMMU_MAP_OP_no_ref_cnt flag.
> 
> Every successful pv_iommu_op will result in an additional page reference being
> taken on the MFN backing the GFN except for the condition detailed above.
> 
> If the map_op flags indicate a writeable mapping is required then a writeable
> page type reference will be taken otherwise a standard page reference will be
> taken.
> 
> All the following conditions are required to be true for PV IOMMU map
> subop to succeed:
> 
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. If hardware_domain and the following Xen IOMMU options are
>    NOT enabled: dom0-passthrough
> 
> This subop usage of the `struct pv_iommu_op` and `struct map_page` fields
> are detailed below:
> 
> ------------------------------------------------------------------------------
> Field          Purpose
> -----          ---------------------------------------------------------------
> `bfn`          [in]  Bus address frame number(BFN) to be mapped to specified 
> gfn
>                      below
> 
> `gfn`          [in]  Guest address frame number for DOMID_SELF
> 
> `flags`        [in]  Flags for signalling type of IOMMU mapping to be created,
>                      Flags can be combined.
> 
> `status`       [out] Mapping status of this op, op specific values listed 
> below
> ------------------------------------------------------------------------------
> 
> Defined bits for flags field:
> 
> Name                        Bit                Definition
> ----                       -----      ----------------------------------
> IOMMU_OP_readable            0        Create readable IOMMU mapping
> IOMMU_OP_writeable           1        Create writeable IOMMU mapping
> IOMMU_MAP_OP_no_ref_cnt      2        IOMMU mapping does not take a reference 
> to
>                                       MFN backing BFN mapping
> Reserved for future use     3-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for both gfn and 
> bfn
> 
> Defined values for map_page subop status field:
> 
> Value   Reason
> ------  ----------------------------------------------------------------------
> 0       subop successfully returned
> -EIO    IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM  GFN could not be mapped because the GFN belongs to Xen.
> -EPERM  Domain is not the hardware domain and GFN does not belong to domain
> -EPERM  Domain is the hardware domain, IOMMU dom-strict mode is enabled and
>         GFN does not belong to domain
> -EACCES BFN address conflicts with RMRR regions for devices attached to
>         DOMID_SELF
> -ENOSPC Page order is too large for either BFN, GFN or IOMMU unit
> 
> IOMMUOP_unmap_page
> ------------------
> This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.
> 
> The subop usage of the `struct pv_iommu_op` and `struct unmap_page` fields
> are detailed below:
> 
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `bfn`          [in] Bus address frame number to be unmapped in DOMID_SELF
> 
> `flags`        [in] Flags for signalling page order of unmap operation
> 
> `status`       [out] Mapping status of this unmap operation, 0 indicates 
> success
> --------------------------------------------------------------------
> 
> Defined bits for flags field:
> 
> Name                        Bit                Definition
> ----                       -----      ----------------------------------
> Reserved for future use     0-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for bfn
> 
> 
> Defined values for unmap_page subop status field:
> 
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EIO         IOMMU unit returned error when attempting to unmap BFN.
> -ENOSPC      Page order is too large for either BFN address or IOMMU unit
> ------------------------------------------------------------------------
> 
> 
> IOMMUOP_map_foreign_page
> ------------------------
> This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
> 
> It is not valid to use a domid representing the calling domain.
> 
> The hypercall will only succeed if calling domain has sufficient privilege 
> over
> the specified domid.
> 
> The M2B mechanism is an MFN to (BFN,domid,ioserver) tuple.
> 
> Each successful subop will add to the M2B if there was not an existing 
> identical
> M2B entry.
> 
> Every new M2B entry will take a reference to the MFN backing the GFN.
> 
> All the following conditions are required to be true for PV IOMMU map_foreign
> subop to succeed:
> 
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. The domain is the hardware_domain and the following Xen IOMMU options are
>    NOT enabled: dom0-passthrough
> 
> 
> This subop usage of the `struct pv_iommu_op` and `struct map_foreign_page`
> fields are detailed below:
> 
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `domid`        [in] The domain id for which the gfn field applies
> 
> `ioserver`     [in] IOREQ server id associated with mapping
> 
> `bfn`          [in] Bus address frame number for gfn address
> 
> `gfn`          [in] Guest address frame number
> 
> `flags`        [in] Details the status of the BFN mapping
> 
> `status`       [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
> 
> Defined bits for flags field:
> 
> Name                         Bit                Definition
> ----                        -----      ----------------------------------
> IOMMUOP_readable              0        BFN IOMMU mapping is readable
> IOMMUOP_writeable             1        BFN IOMMU mapping is writeable
> IOMMUOP_swap_mfn              2        BFN IOMMU mapping can be safely
>                                        swapped to scratch page
> Reserved for future use      3-9       Reserved flag bits should be 0
> IOMMU_page_order            10-15      Page order to be used for both gfn and 
> bfn
> 
> Defined values for map_foreign_page subop status field:
> 
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EIO         IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM       Calling domain does not have sufficient privilege over domid
> -EPERM       GFN could not be mapped because the GFN belongs to Xen.
> -EPERM       domid maps to DOMID_SELF
> -EACCES      BFN address conflicts with RMRR regions for devices attached to
>              DOMID_SELF
> -ENODEV      Provided ioserver id is not valid
> -ENXIO       Provided domid id is not valid
> -ENXIO       Provided GFN address is not valid
> -ENOSPC      Page order is too large for either BFN, GFN or IOMMU unit
> 
> IOMMU_lookup_foreign_page
> -------------------------
> This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.
> 
> This subop lookups up a BFN mapping for a ioserver + gfn + target domid
> combination.
> 
> The hypercall will only succeed if calling domain has sufficient privilege 
> over
> the specified domid.
> 
> If a 1:1 mapping exists of BFN to MFN then a M2B entry is added and a
> reference is taken to the underlying MFN. If an existing mapping is present
> then the BFN is returned and no additional reference's will be taken to the
> underlying MFN.
> 
> A 1:1 mapping will exist if there is no IOMMU support or if the PV hardware
> domain was booted in dom0-relaxed mode or in dom0-passthrough mode.
> 
> If there is no IOMMU support then the MFN is returned in the BFN field (that 
> is
> the only valid bus address for the GFN + domid combination).
> 
> Each successful subop will add to the M2B if there was not an existing 
> identical
> M2B entry.
> 
> Every new M2B entry will take a reference to the MFN backing the GFN.
> 
> This subop usage of the `struct pv_iommu_op` and `struct lookup_foreign_page`
> fields are detailed below:
> 
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `domid`        [in] The domain id for which the gfn field applies
> 
> `ioserver`     [in] IOREQ server id associated with mapping
> 
> `bfn`          [out] Bus address frame number for gfn address
> 
> `gfn`          [in] Guest address frame number
> 
> `flags`        [out] Details the status of the BFN mapping
> 
> `status`       [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
> 
> Defined bits for flags field:
> 
> Name                         Bit                Definition
> ----                        -----      ----------------------------------
> IOMMUOP_readable              0        Returned BFN IOMMU mapping is readable
> IOMMUOP_writeable             1        Returned BFN IOMMU mapping is writeable
> Reserved for future use      2-9       Reserved flag bits should be 0
> IOMMU_page_order            10-15      Returns maximum possible page order for
>                                        all other IOMMUOP subops
> 
> Defined values for lookup_foreign_page subop status field:
> 
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EPERM       Calling domain does not have sufficient privilege over domid
> -ENOENT      There is no available BFN for provided GFN + domid combination
> -ENODEV      Provided ioserver id is not valid
> -ENXIO       Provided domid id is not valid
> -ENXIO       Provided GFN address is not valid
> 
> 
> IOMMUOP_unmap_foreign_page
> --------------------------
> This subop uses `struct unmap_foreign_page` part of the `struct pv_iommu_op`.
> 
> It only allows BFNs acquired via IOMMUOP_map_foreign_page or 
> IOMMUOP_lookup_page
> to be unmapped. If an attempt is made to unmap a BFN mapped via 
> IOMMUOP_map_page
> then the subop will fail.
> 
> The subop will perform a B2M lookup (IO page table walk) for the calling 
> domain
> and then index the M2B using the returned MFN. This is safe because a 
> particular
> BFN mapping can only map to one MFN for a particular calling domain.
> 
> This subop usage of the `struct pv_iommu_op` and `struct unmap_foreign_page` 
> fields
> are detailed below:
> 
> -----------------------------------------------------------------------
> Field          Purpose
> -----          --------------------------------------------------------
> `ioserver`     [in] IOREQ server id associated with mapping
> 
> `bfn`          [in] Bus address frame number for gfn address
> 
> `flags`        [in] Flags for signalling page order of unmap operation
> 
> `status`       [out] status of this subop, 0 indicates success
> -----------------------------------------------------------------------
> 
> Defined bits for flags field:
> 
> Name                        Bit                Definition
> ----                        -----     ----------------------------------
> Reserved for future use     0-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for bfn unmapping
> 
> Defined values for unmap_foreign_page subop status field:
> 
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -ENOENT      An M2B entry was not found for the specified input parameters.
> 
> 
> Linux kernel architecture
> =========================
> 
> The Linux kernel will use the PV-IOMMU hypercalls to map its PFN address
> space into the IOMMU. It will map the PFNs to the IOMMU address space using
> a 1:1 mapping, it does this by programming a BFN to GFN mapping which matches
> the PFN to GFN mapping.
> 
> The native SWIOTLB will be used to handle devices which cannot DMA to all of
> the kernel's PFN address space.
> 
> An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page
> subops which will allow the Linux kernel to centrally manage that domain's BFN
> resource and ensure there are no unexpected conflicts.
> 
> Kernel Map Foreign GFN to BFN interface
> ---------------------------------------
> 
> An array of 'count' of 'struct pv_iommu_ops' will be passed to the mapping
> function.
> 
>     int map_foreign_gfn_to_bfn(int count, struct pv_iommu_op *ops)
> 
> The calling function will use the `struct map_foreign_page` inside the `struct
> pv_iommu_op` and will fill in the domid, gfn and ioserver_id fields.
> 
> The kernel function will reuse the passed in struct pv_iommu_op for the
> hypercall and will set the subop_id field based on the IOMMU_QUERY_map_cap
> capability.
> 
> If the IOMMU_QUERY_map_cap is set then the kernel will allocate a suitable BFN
> address, set the BFN field in the op to this address and set the subop_id to
> IOMMUOP_map_page. It will do this on all 'ops' and then issue the hypercall.
> 
> If the IOMMU_QUERY_map_cap is NOT set then the kernel will set the subops_id
> to IOMMUOP_lookup_page on all `ops` and then issue the hypercall.
> 
> The calling function should check the status field in each op and if the
> status field is 0 then it can use the returned BFN address in each op.
> 
> 
> Kernel Unmap Foreign GFN to BFN interface
> -----------------------------------------
> 
> An array of 'count' of 'struct pv_iommu_ops' will be passed the mapping
> function.
> 
>     int unmap_foreign_gfn_to_bfn(int count, struct pv_iommu_op *ops)
> 
> The calling function will use the `struct unmap_foreign_page` inside the 
> `struct
> pv_iommu_op` and will fill in the bfn field.
> 
> The kernel function will set the subop_id field to IOMMUOP_unmap_foreign_page
> in each op and then issue the hypercall.
> 
> The calling function should check the status field in each op and if the
> status field is 0 then the BFN has been successfully unmapped.
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.