Xen project Mailing List

Re: [Xen-devel] [RFC] Dom0 PV IOMMU control design (draft A)

To: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Mon, 14 Apr 2014 11:09:54 -0400

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 14 Apr 2014 15:10:38 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Apr 14, 2014 at 04:03:22PM +0100, Malcolm Crossley wrote: > On 14/04/14 13:51, Konrad Rzeszutek Wilk wrote: > >On Mon, Apr 14, 2014 at 01:12:07PM +0100, Malcolm Crossley wrote: > >>On 11/04/14 18:50, Konrad Rzeszutek Wilk wrote: > >>>On Fri, Apr 11, 2014 at 06:28:43PM +0100, Malcolm Crossley wrote: > >>>>Hi, > >>>> > >>>>Here is a design for allowing Dom0 PV guests to control the IOMMU. > >With the device driver domains I think you should also rename the > >'dom0' to device driver or 'hardware domain' - as this functionality > >should be possible within an PV guest with PCI passthrough for example. > Currently Xen only allows Dom0 IOMMU access to all (expect Xen) > MFN's. To not change the current security implications of the > feature I would prefer that 'hardware domain' support was added as a > separate design. > > > >>>>This allows for the Dom0 GPFN mapping to be programmed into the > >>>>IOMMU and avoid using the SWIOTLB bounce buffer technique in the > >>>>Linux kernel (except for legacy 32 bit DMA IO devices) > >>>> > >>>>... > >>>> > >>>>Design considerations for hypercall subops > >>>>------------------------------------------- > >>>>IOMMU map/unmap operations can be slow and can involve flushing the > >>>>IOMMU TLB > >>>>to ensure the IO device uses the updated mappings. > >>>> > >>>>The subops have been designed to take an array of operations and a count > >>>>as > >>>>parameters. This allows for easily implemented hypercall > >>>>continuations to be > >>>>used and allows for batches of IOMMU operations to be submitted > >>>>before flushing > >>>>the IOMMU TLB. > >>>> > >>>> > >>>> > >>>>IOMMUOP_map_page > >>>>---------------- > >>>>First argument, pointer to array of `struct iommu_map_op` > >>>>Second argument, integer count of `struct iommu_map_op` elements in array > >>>Could this be 'unsigned integer' count? > >>Yes will change for draft B > >>>Is there a limit? Can I do 31415 of them? Can I do it for the whole > >>>memory space of the guest? > >>There is no current limit, the hypercall will be implemented with > >>continuations to prevent denial of service attacks. > >>>>This subop will attempt to IOMMU map each element in the `struct > >>>>iommu_map_op` > >>>>array and record the mapping status back into the array itself. If > >>>>an mapping > >>>>fault occurs then the hypercall will return with -EFAULT. > >>>>This subop will inspect the MFN address being mapped in each > >>>>iommu_map_op to > >>>>ensure it does not belong to the Xen hypervisor itself. If the MFN > >>>>does belong > >>>>to the Xen hypervisor the subop will return -EPERM in the status > >>>>field for that > >>>>particular iommu_map_op. > >>>Is it OK if the MFN belongs to another guest? > >>> > >>It is OK for the MFN to belong to another guest because only Dom0 is > >>performing the mapping. This is to allow grant mapped pages to have > >>a Dom0 BFN mapping. > >That should be reflected in the spec. That the MFN have to pass the > >grant check. > Again, if the design is restricted to domain 0 only then there is no > need for a grant check because previously domain 0 had a bus > addressable mapping for all MFN's. > >>>>The IOMMU TLB will only be flushed when the hypercall completes or a > >>>>hypercall > >>>>continuation is created. > >>>> > >>>> struct iommu_map_op { > >>>> uint64_t bfn; > >>>bus_frame ? > >>Yes, or you could say, Bus Address with 4k page size granularity. > > > >The 'bfn' sounds like a new word. I was hoping you could use something > >that is more in line with the nomenclature in the I/O world. Such as > >'bus addresses', 'bus physical frame', etc. The 'bfn' sounds just > >off. The best I could come up with was 'bus_frame' but there has to > >be something that sounds better? > I agree that bfn is not ideal. Using bus_address is dangerous > because the mapping can only be done at 4k granularity (MFN's are 4k > granularity also). > "iofn" is an alternative but potentially misleading because IOMMU's > may not be able to translate all IO device's accessible to dom0 > (particularly on ARM). Who does translate the rest of those I/O devices? Other IOMMUs? > > > >>>> uint64_t mfn; > >>>> uint32_t flags; > >>>> int32_t status; > >>>> }; > >>>> > >>>>------------------------------------------------------------------------------ > >>>>Field Purpose > >>>>----- --------------------------------------------------------------- > >>>>`bfn` [in] Bus address frame number to mapped to specified > >>>>mfn below > >>>Huh? Isn't this out? If not, isn't bfn == mfn for dom0? > >>>How would dom0 know the bus address? That usually is something only the > >>>IOMMU knows. > >>The idea is that Domain0 sets up BFN to MFN mappings which are the > >>same as the GPFN to MFN mappings. This cannot be done from Xen's M2P > >>mappings because they are not up to date for PV guests. > >OK, but if GPFN == MFN, then this is not an bus frame number. You might > >as well call it 'gmfn' ? > The idea is not to force the particular mappings that domain 0 is > allowed to create but the point I was trying to make above is that > domain 0 is responsible for tracking the bfn mappings it has setup. OK, just mention that in the spec :-) > >>>>`mfn` [in] Machine address frame number > >>>> > >>>We still need to do a bit of PFN -> MFN -> hypercall -> GFN and program > >>>that in the PCIe devices right? > >>Yes, a GPFN to MFN lookup will be required but this is simply > >>consulting the Guest P2M. BTW, we are programming the IOMMU not the > >>PCIe device's themselves. > >We do have to program some value in the PCIe device so that it can > >do the DMA operation. Said value right now is the MFN << PAGE_SHIFT (aka > >physical address). But as the spec says - the 'physical address' might > >not be equal to the 'bus address'. Hence the value we would be programming > >in might be very well different than MFN. > Yes but domain 0 will track the BFN to MFN mappings it has setup ( > if it setup BFN= GPFN then this is easy). > So domain 0 is responsible to programming the correct BFN address > into the PCIe device. > > > >Or in other words - the IOMMU could be non-coherent and it would use > >its own addresses for physical addresses. As in, different than > >the physical ones. > I don't understand what you are trying to say here? When you say > physical address do you mean MFN (memory based addresses) or BFN > (bus addresses from the device point of view)? physical address = CPU viewpoint, aka MFN. bus addresses = IOMMU or I/O viewpoint, aka BFN. And BFN != MFN. > >>> > >>>>`flags` [in] Flags for signalling type of IOMMU mapping to be > >>>>created > >>>> > >>>>`status` [out] Mapping status of this map operation, 0 > >>>>indicates success > >>>>------------------------------------------------------------------------------ > >>>> > >>>> > >>>>Defined bits for flags field > >>>>------------------------------------------------------------------------ > >>>>Name Bit Definition > >>>>---- ----- ---------------------------------- > >>>>IOMMU_MAP_OP_readable 0 Create readable IOMMU mapping > >>>>IOMMU_MAP_OP_writeable 1 Create writeable IOMMU mapping > >>>And is it OK to use both? > >>Yes and typically both would be used. I'm just allowing for read > >>only mappings and write only mappings to be created. > >You need to mention that in the spec that it is OK to combine them. > >Or mention which ones are not OK to combine. > Different IOMMU implementations may change what type of mappings can > be setup. I would expect the guest to handle the error if particular > type of mapping can't be setup which the guest requires. Ok, that needs to be mentioned in the spec too > >>>>Reserved for future use 2-31 n/a > >>>>------------------------------------------------------------------------ > >>>> > >>>>Additional error codes specific to this hypercall: > >>>> > >>>>Error code Reason > >>>>---------- ------------------------------------------------------------ > >>>>EPERM PV IOMMU mode not enabled or calling domain is not domain 0 > >>>And -EFAULT > >>I was considering that EFAULT would be standard error code, do you > >>want it to be explicitly listed? > >Yes. This is a spec and if you don't spell it out folks might miss this. > > > >>>and what about success? Do you return 0 or the number of ops that were > >>>successfull? > >>Return 0 if all ops were successful, otherwise return the number of > >>failed operations as positive number. I leave it up to the caller to > >>determine which operations failed by iterating the input array. I'll > >>will add this to draft B. > >Thank you. > >>>>------------------------------------------------------------------------ > >>>> > >>>>IOMMUOP_unmap_page > >>>>---------------- > >>>>First argument, pointer to array of `struct iommu_map_op` > >>>>Second argument, integer count of `struct iommu_map_op` elements in array > >>>Um, 'unsigned integer' count? > >>Yes will change for draft B > >>>>This subop will attempt to unmap each element in the `struct > >>>>iommu_map_op` array > >>>>and record the mapping status back into the array itself. If an > >>>>unmapping fault > >>>>occurs then the hypercall stop processing the array and return with > >>>>an EFAULT; > >>>> > >>>>The IOMMU TLB will only be flushed when the hypercall completes or a > >>>>hypercall > >>>>continuation is created. > >>>> > >>>> struct iommu_map_op { > >>>> uint64_t bfn; > >>>> uint64_t mfn; > >>>> uint32_t flags; > >>>> int32_t status; > >>>> }; > >>>> > >>>>-------------------------------------------------------------------- > >>>>Field Purpose > >>>>----- ----------------------------------------------------- > >>>>`bfn` [in] Bus address frame number to be unmapped > >>>I presume this is gathered from the 'map' call? > >>It does not need to be gathered, Domain 0 is responsible for it's > >>own BFN mappings and there is no auditing of the BFN address > >>themselves. > >I can see this be the case when BFN == MFN. But if that is not the > >case I cannot see how the domain would gather the BFNs - unless it > >gets it from the 'map' call. > Domain 0 issue's the original PV IOMMU map hypercall to cause the > BFN mapping to be created so it should be able to track what needs > to be unmapped. Earlier you stated: "It does not need to be gathered' but it does - via the 'map' call which you confirmed it here. Please just note it in the spec that these values should used from prior 'map' calls. > > > >>>>`mfn` [in] This field is ignored for unmap subop > >>>> > >>>>`flags` [in] This field is ignored for unmap subop > >>>> > >>>>`status` [out] Mapping status of this unmap operation, 0 > >>>>indicates success > >>>>-------------------------------------------------------------------- > >>>> > >>>>Additional error codes specific to this hypercall: > >>>> > >>>>Error code Reason > >>>>---------- ------------------------------------------------------------ > >>>>EPERM PV IOMMU mode not enabled or calling domain is not domain 0 > >>>EFAULT too > >>I was considering that EFAULT would be standard error code, do you > >>want it to be explicitly listed? > >Yes. > >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.