[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Xen PV IOMMU interface draft B



On 16/06/15 14:19, Jan Beulich wrote:
>>>> On 12.06.15 at 18:43, <malcolm.crossley@xxxxxxxxxx> wrote:
>> IOMMUOP_query_caps
>> ------------------
>>
>> This subop queries the runtime capabilities of the PV-IOMMU interface for 
>> the
>> specific called domain. This subop uses `struct pv_iommu_op` directly.
> 
> "calling domain" perhaps?
> 
>> ----------------------------------------------------------------------------
>> --
>> Field          Purpose
>> -----          
>> ---------------------------------------------------------------
>> `flags`        [out] This field details the IOMMUOP capabilities.
>>
>> `status`       [out] Status of this op, op specific values listed below
>> ----------------------------------------------------------------------------
>> --
>>
>> Defined bits for flags field:
>>
>> ----------------------------------------------------------------------------
>> --
>> Name                        Bit                Definition
>> ----                       ------     ----------------------------------
>> IOMMU_QUERY_map_cap          0        IOMMUOP_map_page or IOMMUOP_map_foreign
>>                                       can be used for this domain
> 
> "this" (see also above) perhaps being the calling domain? In which
> case I wonder how the "for" and IOMMUOP_map_foreign are
> meant to fit together: I assume the flag to indicate that mapping into
> the (calling) domain is possible. Which then makes me wonder - what
> use if the new hypercall when this flag isn't set?

"This" is the calling domain. The IOMMU_lookup_foreign should continue to work 
if
this flag is not set.

> 
>> IOMMU_QUERY_map_all_gfns     1        IOMMUOP_map_page subop can map any MFN
>>                                       not used by Xen
> 
> "gfns" or "MFN"?

gfns . This is meant to apply to the hardware domain only, it's to allow the
same access control as dom0-relaxed mode allows currently.

> 
>> Defined values for map_page subop status field:
>>
>> Value   Reason
>> ------  
>> ----------------------------------------------------------------------
>> 0       subop successfully returned
>> -EIO    IOMMU unit returned error when attempting to map BFN to GFN.
>> -EPERM  GFN could not be mapped because the GFN belongs to Xen.
>> -EPERM  Domain is not a  domain and GFN does not belong to domain
> 
> "is not a hardware domain"? Also, I think we're pretty determined
> for there to ever only be one, so perhaps it should be "the
> hardware domain" here and elsewhere.

That is a typo. It should say "is not the hardware domain"

I will correct the other occurrences of "a hardware domain" to "the
hardware domain".

> 
>> IOMMUOP_unmap_page
>> ------------------
>> This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.
>>
>> The subop usage of the "struct pv_iommu_op" and ``struct unmap_page` fields
>> are detailed below:
>>
>> --------------------------------------------------------------------
>> Field          Purpose
>> -----          -----------------------------------------------------
>> `bfn`          [in] Bus address frame number to be unmapped in DOMID_SELF
> 
> Has it been determined that unmapping based on GFN is never
> going to be needed, and that unmapping by BFN is the more
> practical solution? The map counterpart doesn't seem to exclude
> establishing multiple mappings for the same BFN, and hence the
> inverse here would become kind of fuzzy in that case.

There will be only one BFN to MFN mapping per domain, the map hypercall will
fail any attempt to have map a BFN to more than one GFN. This is why the unmap
is based on the BFN. It is allowed to have multiple BFN mappings of the same
GFN however.

> 
>> IOMMUOP_map_foreign_page
>> ----------------
>> This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
>>
>> It is not valid to use domid representing the calling domain.
> 
> And what's the point of that? Was it considered to have only one
> map/unmap pair, capable of mapping both local and foreign pages?
> If so, what speaks against that?

It was considered to have one map/unmap pair. The foreign map operation is the
more complex of the two types of mappings and so I thought it would make for a
cleaner API to have a separate subops for each type of mapping. The handling of
M2B in particular is what may make the internal implementation complex.

> 
>> The M2B mechanism is a MFN to (BFN,domid,ioserver) tuple.
> 
> I didn't see anything explaining the significance of this (namely
> the ioserver part, I think I can see the need for the domid), can
> you explain the backgound here please?

The ioserver part of the tuple is mainly for supporting a notification
mechanism when a guest balloons out a GFN. The

> 
>> Every new M2B entry will take a reference to the MFN backing the GFN.
> 
> What happens when that GFN->MFN mapping changes?

The IOREQ server will be notified so that is can ensure any mediated device
is not using the now invalid BFN mapping. Once all BFN mappings are removed
by affected IOREQ servers (decrementing reference count each time) then the
MFN will be released back to Xen.

> 
>> All the following conditions are required to be true for PV IOMMU 
>> map_foreign
>> subop to succeed:
>>
>> 1. IOMMU detected and supported by Xen
>> 2. The domain has IOMMU controlled hardware allocated to it
>> 3. The domain is a hardware_domain and the following Xen IOMMU options are
>>    NOT enabled: dom0-passthrough
> 
> Is there a way for the hardware domain to know it is running in
> pass-through mode? Also, "the domain" is ambiguous here; I'm
> sure you mean the invoking domain, not the one owning the page.

The IOMMU_QUERY_map_cap flag will not be set. I will change "the domain" to
"the calling domain".

> 
>> This subop usage of the "struct pv_iommu_op" and ``struct map_foreign_page`
>> fields are detailed below:
>>
>> --------------------------------------------------------------------
>> Field          Purpose
>> -----          -----------------------------------------------------
>> `domid`        [in] The domain ID for which the gfn field applies
>>
>> `ioserver`     [in] IOREQ server id associated with mapping
>>
>> `bfn`          [in] Bus address frame number for gfn address
> 
> In the description above you speak of returning data in this field. Is
> [in] really correct?
> 
The description is wrong, the map_foreign_page will allow the bfn to be
specified by the calling domain.

>> Defined bits for flags field:
>>
>> Name                         Bit                Definition
>> ----                        -----      ----------------------------------
>> IOMMUOP_readable              0        BFN IOMMU mapping is readable
>> IOMMUOP_writeable             1        BFN IOMMU mapping is writeable
>> IOMMUOP_swap_mfn              2        BFN IOMMU mapping can be safely
>>                                        swapped to scratch page
> 
> Scratch page? This needs some explanation.

Hopefully this is explained further down. It is expected that certain BFN 
mappings
are mapping data for the mediated device and not control information. By 
allowing
a scratch page to be used Xen can immediately swap the BFN mappings and remove 
the
references to the affected MFN and free the MFN back to Xen. This avoids the 
delay
for the ioreq servers to unmap the BFNs explicitly and hopefully speeds up 
ballooning.

> 
>> Reserved for future use      3-9       Reserved flag bits should be 0
>> IOMMU_page_order            10-15      Returns maximum possible page order 
>> for
>>                                        all other IOMMUOP subops
> 
> "Returns"? "other"?
Copy paste error, this should be the input page order not the returned page 
order.

> 
>> IOMMU_lookup_foreign_page
>> ----------------
>> This subop uses `struct lookup_foreign_page` part of the `struct 
>> pv_iommu_op`.
>>
>> If the BFN is specified as an input and parameter and there is no IOMMU 
>> support
>> for the calling domain then an error will be returned.
> 
> "input and parameter"? The field is marked [out] below, and I also
> can't see a flag allowing this to be optional (the more that flags is
> also [out]).

Another mistake by me. This is old text from a draft where map and lookup were
the same subop.

The BFN is always an output parameter for the lookup_foreign_page subop.

> 
>> It is the calling domain responsibility to ensure there are no conflicts
> 
> No races you mean?

Race's are also the calling domain responsibility but I'm referring to BFN
address space conflicts here.

> 
>> Each successful subop will add to the M2B if there was not an existing 
>> identical
>> M2B entry.
>>
>> Every new M2B entry will take a reference to the MFN backing the GFN.
> 
> This is a lookup - why would it add something somewhere? Perhaps
> this is just a copy-and-paste mistake? Or is the use of "lookup" here
> misleading (as the last section of the document seems to suggest)?

I can see how the term "lookup" is misleading. This subop really does a
lookup and take reference. Can you suggest an alternative name? I'm wary of
allowing

> 
>> Defined bits for flags field:
>>
>> Name                         Bit                Definition
>> ----                        -----      ----------------------------------
>> IOMMUOP_readable              0        Returned BFN IOMMU mapping is 
>> readable
>> IOMMUOP_writeable             1        Returned BFN IOMMU mapping is 
>> writeable
>> Reserved for future use      2-9       Reserved flag bits should be 0
> 
> Reserved flags will be returned as 0.

Agreed, I will make the change.

> 
>> Defined values for lookup_foreign_page subop status field:
>>
>> Error code  Reason
>> ----------  ------------------------------------------------------------
>> 0            subop successfully returned
>> -EPERM       Calling domain does not have sufficient privilege over domid
>> -ENOENT      There is no available BFN for provided GFN + domid combination
> 
> Throughout here there seems to be a mixture of references to
> (GFN, domid) pairs anmd (GFN,domid,ioserver) triplets. Along
> with there being some explanation missing, these need to be
> made consistent to avoid confusion.

I'll adjust them to all be (GFN,domid,ioserver) triples.

A particular BFN will map to a (GFN, domid) set so a (GFN,domid,ioserver) triple
can be determined via a (BFN, ioserver).


> 
>> IOMMUOP_unmap_foreign_page
>> ----------------
>> This subop uses `struct unmap_foreign_page` part of the `struct 
>> pv_iommu_op`.
>>
>> If there is no IOMMU support then the MFN is returned in the BFN field (that 
>> is
>> the only valid bus address for the GFN + domid combination).
> 
> Copy-and-paste mistake again (doesn't seem to apply to unmap)?

Yes, it's a copy paste mistake.

> 
>> If there is IOMMU support then the specified BFN is returned for the GFN + 
>> domid
>> combination
>>
>> Each successful subop will add to the M2B if there was not an existing 
>> identical
>> M2B entry. The
>>
>> Every new M2B entry will take a reference to the MFN backing the GFN.
> 
> And again?

Yes, it's a copy paste mistake.

> 
>> This subop usage of the "struct pv_iommu_op" and ``struct 
>> unmap_foreign_page` fields
>> are detailed below:
>>
>> -----------------------------------------------------------------------
>> Field          Purpose
>> -----          --------------------------------------------------------
>> `ioserver`      [in] IOREQ server id associated with mapping
>>
>> `bfn`          [in] Bus address frame number for gfn address
>>
>> `flags`        [out] Flags for signalling page order of unmap operation
> 
> [out]?

Yes, it's a copy paste mistake. It should be [in]

> 
>> Error code  Reason
>> ----------  ------------------------------------------------------------
>> 0            subop successfully returned
>> -ENOENT      There is no mapped BFN + ioserver id combination to unmap
> 
> Yet another pair, the unique mapping properties of which don't
> follow from anything said earlier.

A particular BFN will map to a (GFN, domid) set so a (GFN,domid,ioserver) triple
will be determined via a (BFN, ioserver) pair.

The ioserver may have lost track of the (GFN, domid) pair used for a particular
BFN mapping but may have just been notified of it's BFN mapping becoming 
invalid.
So it's more efficient to unmap based on BFN and ioserver ID.

> 
>> PV IOMMU interactions with self ballooning
>> ==========================================
>>
>> The guest should clear any IOMMU mappings it has of it's own pages before
>> releasing a page back to Xen. It will need to add IOMMU mappings after
>> repopulating a page with the populate_physmap hypercall.
>>
>> This requires that IOMMU mappings get a writeable page type reference count 
>> and
>> that guests clear any IOMMU mappings before pinning page table pages.
> 
> I suppose this is only for aware PV guests. If so, perhaps this should
> be made explicit.

This is only for PV guests. I will make the correction.

> 
>> PV IOMMU interactions with grant map/unmap operations
>> =====================================================
>>
>> Grant map operations return a Physical device accessible address (BFN) if 
>> the
>> GNTMAP_device_map flag is set.  This operation currently returns the MFN for 
>> PV
>> guests which may conflict with the BFN address space the guest uses if PV 
>> IOMMU
>> map support is available to the guest.
>>
>> This design proposes to allow the calling domain to control the BFN address 
>> that
>> a grant map operation uses.
>>
>> This can be achieved by specifying that the dev_bus_addr in the
>> gnttab_map_grant_ref structure is used an input parameter instead of the
>> output parameter it is currently.
>>
>> Only PAGE_SIZE aligned addresses are allowed for dev_bus_addr input 
>> parameter.
>>
>> The revised structure is shown below for convenience.
>>
>>     struct gnttab_map_grant_ref {
>>         /* IN parameters. */
>>         uint64_t host_addr;
>>         uint32_t flags;               /* GNTMAP_* */
>>         grant_ref_t ref;
>>         domid_t  dom;
>>         /* OUT parameters. */
>>         int16_t  status;              /* => enum grant_status */
>>         grant_handle_t handle;
>>         /* IN/OUT parameters */
>>         uint64_t dev_bus_addr;
>>     };
>>
>>
>> The grant map operation would then behave similarly to the IOMMUOP_map_page
>> subop for the creation of the IOMMU mapping.
>>
>> The grant unmap operation would then behave similarly to the 
>> IOMMUOP_unmap_page
>> subop for the removal of the IOMMU mapping.
> 
> We're talking about mappings of foreign pages here - aren't these the
> wrong IOMMUOPs then? And if so, where would the ioserver id come
> from?
> 

I don't expected grant mapped pages to be ballooned out or to be directly used 
by
ioservers so I believe the grant mapped pages match more closely to the standard
map_page than the foreign_map_page.


Generally I think I need to rework the document to introduce the some concepts
before the actual interface itself.

Thank you for the comments and sorry for the document having quite a few typos 
and
copy paste errors. It's been through quite a few internal updates.

I think I should add a section to explain that there are two type's of BFN 
address
space users.

The first type (map_page) is more simple and just allows the calling domain to
control the BFN mappings for GFNs it owns and for any grant mapped pages of 
other
domains. There is no need to deal with other domains ballooning their pages will
this type of mapping.

The second type (map_foreign_page) is more complex and allows the calling 
domain to
control the BFN mappings for GFNs of other domains that the  calling domain has 
privilege
over. This type of mapping is designed for mediated passthrough ioservers. It 
has to
deal with other domains ballooning their pages.

The interface is designed so that you cannot "remap" an existing BFN mapping. 
This
allows the different types of BFN mappings to be tracked separately by Xen. 
There
is no need for the M2B for the first type of mapping.

Thanks for the comments. I can see how busy it is on the list and I wasn't 
expecting a
response this quickly :)

Malcolm

> Jan
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.