|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 02/18] VT-d: have callers specify the target level for page table walks
On Fri, Sep 24, 2021 at 11:42:13AM +0200, Jan Beulich wrote:
> In order to be able to insert/remove super-pages we need to allow
> callers of the walking function to specify at which point to stop the
> walk.
>
> For intel_iommu_lookup_page() integrate the last level access into
> the main walking function.
>
> dma_pte_clear_one() gets only partly adjusted for now: Error handling
> and order parameter get put in place, but the order parameter remains
> ignored (just like intel_iommu_map_page()'s order part of the flags).
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> I have to admit that I don't understand why domain_pgd_maddr() wants to
> populate all page table levels for DFN 0.
I think it would be enough to create up to the level requested by the
caller?
Seems like a lazy way to always assert that the level requested by the
caller would be present.
>
> I was actually wondering whether it wouldn't make sense to integrate
> dma_pte_clear_one() into its only caller intel_iommu_unmap_page(), for
> better symmetry with intel_iommu_map_page().
> ---
> v2: Fix build.
>
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -264,63 +264,116 @@ static u64 bus_to_context_maddr(struct v
> return maddr;
> }
>
> -static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
> +/*
> + * This function walks (and if requested allocates) page tables to the
> + * designated target level. It returns
> + * - 0 when a non-present entry was encountered and no allocation was
> + * requested,
> + * - a small positive value (the level, i.e. below PAGE_SIZE) upon allocation
> + * failure,
> + * - for target > 0 the address of the page table holding the leaf PTE for
^ physical
I think it's clearer, as the return type could be ambiguous.
> + * the requested address,
> + * - for target == 0 the full PTE.
Could this create confusion if for example one PTE maps physical page
0? As the caller when getting back a full PTE with address 0 and some of
the low bits set could interpret the result as an error.
I think we already had this discussion on other occasions, but I would
rather add a parameter to be used as a return placeholder (ie: a
*dma_pte maybe?) and use the function return value just for errors
because IMO it's clearer, but I know you don't usually like this
approach, so I'm not going to insist.
> + */
> +static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
> + unsigned int target,
> + unsigned int *flush_flags, bool alloc)
> {
> struct domain_iommu *hd = dom_iommu(domain);
> int addr_width = agaw_to_width(hd->arch.vtd.agaw);
> struct dma_pte *parent, *pte = NULL;
> - int level = agaw_to_level(hd->arch.vtd.agaw);
> - int offset;
> + unsigned int level = agaw_to_level(hd->arch.vtd.agaw), offset;
> u64 pte_maddr = 0;
>
> addr &= (((u64)1) << addr_width) - 1;
> ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> + ASSERT(target || !alloc);
Might be better to use an if with ASSERT_UNREACHABLE and return an
error? (ie: level itself?)
> +
> if ( !hd->arch.vtd.pgd_maddr )
> {
> struct page_info *pg;
>
> - if ( !alloc || !(pg = iommu_alloc_pgtable(domain)) )
> + if ( !alloc )
> + goto out;
> +
> + pte_maddr = level;
> + if ( !(pg = iommu_alloc_pgtable(domain)) )
> goto out;
>
> hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
> }
>
> - parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
> - while ( level > 1 )
> + pte_maddr = hd->arch.vtd.pgd_maddr;
> + parent = map_vtd_domain_page(pte_maddr);
> + while ( level > target )
> {
> offset = address_level_offset(addr, level);
> pte = &parent[offset];
>
> pte_maddr = dma_pte_addr(*pte);
> - if ( !pte_maddr )
> + if ( !dma_pte_present(*pte) || (level > 1 &&
> dma_pte_superpage(*pte)) )
> {
> struct page_info *pg;
> + /*
> + * Higher level tables always set r/w, last level page table
> + * controls read/write.
> + */
> + struct dma_pte new_pte = { DMA_PTE_PROT };
>
> if ( !alloc )
> - break;
> + {
> + pte_maddr = 0;
> + if ( !dma_pte_present(*pte) )
> + break;
> +
> + /*
> + * When the leaf entry was requested, pass back the full PTE,
> + * with the address adjusted to account for the residual of
> + * the walk.
> + */
> + pte_maddr = pte->val +
Wouldn't it be better to use dma_pte_addr(*pte) rather than accessing
pte->val, and then you could drop the PAGE_MASK?
Or is the addr parameter not guaranteed to be page aligned?
> + (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> + PAGE_MASK);
> + if ( !target )
> + break;
I'm confused by the conditional break here, why would you calculate
pte_maddr unconditionally to get overwritten just the line below if
target != 0?
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |