[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v4 16/21] VT-d: free all-empty page tables


  • To: "Beulich, Jan" <JBeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Wed, 27 Apr 2022 04:09:24 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5CEBpfM0nnYYOZ8kAid6Ce67P6Tfk0FK38+Fe9mQ86Y=; b=eYlFX7p1qR5kBjP5+QosPLxCKg1Kz+3lFrJY6eT6Nqs8OnqAA7yikLTD4OrbS/Y3Rcb6+k12C+3uciTDyA7pyoEPJi96VAQ52lp6ElbBEIUNyeU++nhMeRR9bBuPA40hIYPzdcV4ADDsXtk/v5ROg6JB2dhD/ywScmd3+F8PKRnXGg7iCEIRn7GUeHvRM24hYhbNWYjSLbEYQ/XTKybOSVbOQLlgZQFAXDjPi+JDViGW3I0iQ5PYmlRZtEGEhlnpNTQWSzkkDaezmCbe2LYmPRcXd3SFiZYRUQnCz1leBhQTtOnnm0b4m3X8WQBLqPt/NMZeTWFTtbBQWhLbPl9cuA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VsTVSQC8BJBsI0SOwqOFaGe17iQvNwXboEB0tK4RoWvhgx7uIRNSDeVwImFecr3D3apZlSLJ2/9uKmZBhZibAE/PHIvi2H41eAniiDtxrsLzyEWG9/WjTIZaFgdEaZ7Zs/XtcoMrhj3mW9LTv8YvU2GDU79kDRsDxyp2xWvd6OyYgHLUhRxN45NQPiNY/kSv/yY5McVmEbO6U1aIOpVBs9qjHkJo7G/JPDgBqgkIjHvbF33FBwwdEwNzzbBYc2eKgS25LWGNbyKrTCBqaf8Dt13yPcIJn0042CYClbm74rTf6ps+dh7RdKgToH4/L6TJcsqd5rWpbCR681zSVneVww==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com;
  • Cc: "Cooper, Andrew" <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Pau Monné, Roger <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 27 Apr 2022 04:09:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYWIB55LiJis4LlU+Of8jQSBwnua0DJPWw
  • Thread-topic: [PATCH v4 16/21] VT-d: free all-empty page tables

> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Monday, April 25, 2022 4:43 PM
> 
> When a page table ends up with no present entries left, it can be
> replaced by a non-present entry at the next higher level. The page table
> itself can then be scheduled for freeing.
> 
> Note that while its output isn't used there yet,
> pt_update_contig_markers() right away needs to be called in all places
> where entries get updated, not just the one where entries get cleared.
> 
> Note further that while pt_update_contig_markers() updates perhaps
> several PTEs within the table, since these are changes to "avail" bits
> only I do not think that cache flushing would be needed afterwards. Such
> cache flushing (of entire pages, unless adding yet more logic to me more
> selective) would be quite noticable performance-wise (very prominent
> during Dom0 boot).
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>

> ---
> v4: Re-base over changes earlier in the series.
> v3: Properly bound loop. Re-base over changes earlier in the series.
> v2: New.
> ---
> The hang during boot on my Latitude E6410 (see the respective code
> comment) was pretty close after iommu_enable_translation(). No errors,
> no watchdog would kick in, just sometimes the first few pixel lines of
> the next log message's (XEN) prefix would have made it out to the screen
> (and there's no serial there). It's been a lot of experimenting until I
> figured the workaround (which I consider ugly, but halfway acceptable).
> I've been trying hard to make sure the workaround wouldn't be masking a
> real issue, yet I'm still wary of it possibly doing so ... My best guess
> at this point is that on these old IOMMUs the ignored bits 52...61
> aren't really ignored for present entries, but also aren't "reserved"
> enough to trigger faults. This guess is from having tried to set other
> bits in this range (unconditionally, and with the workaround here in
> place), which yielded the same behavior.
> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -43,6 +43,9 @@
>  #include "vtd.h"
>  #include "../ats.h"
> 
> +#define CONTIG_MASK DMA_PTE_CONTIG_MASK
> +#include <asm/pt-contig-markers.h>
> +
>  /* dom_io is used as a sentinel for quarantined devices */
>  #define QUARANTINE_SKIP(d, pgd_maddr) ((d) == dom_io && !(pgd_maddr))
>  #define DEVICE_DOMID(d, pdev) ((d) != dom_io ? (d)->domain_id \
> @@ -405,6 +408,9 @@ static uint64_t addr_to_dma_page_maddr(s
> 
>              write_atomic(&pte->val, new_pte.val);
>              iommu_sync_cache(pte, sizeof(struct dma_pte));
> +            pt_update_contig_markers(&parent->val,
> +                                     address_level_offset(addr, level),
> +                                     level, PTE_kind_table);
>          }
> 
>          if ( --level == target )
> @@ -837,9 +843,31 @@ static int dma_pte_clear_one(struct doma
> 
>      old = *pte;
>      dma_clear_pte(*pte);
> +    iommu_sync_cache(pte, sizeof(*pte));
> +
> +    while ( pt_update_contig_markers(&page->val,
> +                                     address_level_offset(addr, level),
> +                                     level, PTE_kind_null) &&
> +            ++level < min_pt_levels )
> +    {
> +        struct page_info *pg = maddr_to_page(pg_maddr);
> +
> +        unmap_vtd_domain_page(page);
> +
> +        pg_maddr = addr_to_dma_page_maddr(domain, addr, level,
> flush_flags,
> +                                          false);
> +        BUG_ON(pg_maddr < PAGE_SIZE);
> +
> +        page = map_vtd_domain_page(pg_maddr);
> +        pte = &page[address_level_offset(addr, level)];
> +        dma_clear_pte(*pte);
> +        iommu_sync_cache(pte, sizeof(*pte));
> +
> +        *flush_flags |= IOMMU_FLUSHF_all;
> +        iommu_queue_free_pgtable(hd, pg);
> +    }
> 
>      spin_unlock(&hd->arch.mapping_lock);
> -    iommu_sync_cache(pte, sizeof(struct dma_pte));
> 
>      unmap_vtd_domain_page(page);
> 
> @@ -2182,8 +2210,21 @@ static int __must_check cf_check intel_i
>      }
> 
>      *pte = new;
> -
>      iommu_sync_cache(pte, sizeof(struct dma_pte));
> +
> +    /*
> +     * While the (ab)use of PTE_kind_table here allows to save some work in
> +     * the function, the main motivation for it is that it avoids a so far
> +     * unexplained hang during boot (while preparing Dom0) on a Westmere
> +     * based laptop.
> +     */
> +    pt_update_contig_markers(&page->val,
> +                             address_level_offset(dfn_to_daddr(dfn), level),
> +                             level,
> +                             (hd->platform_ops->page_sizes &
> +                              (1UL << level_to_offset_bits(level + 1))
> +                              ? PTE_kind_leaf : PTE_kind_table));
> +
>      spin_unlock(&hd->arch.mapping_lock);
>      unmap_vtd_domain_page(page);
> 


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.