[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 6 May 2022 13:16:09 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=D5mWNDydRom74FI2cvlU8uvhR7nBz88hVQp5h7X4TY4=; b=NjnUxTKtcKMYKHHQeUhvjGJJI01SowiSw0ESma2dOY2xM7jewFULP3I7abe8MLxZWboXfajtdRC/koUeG9AGSg1D3+LLH8ZCGKkM8PRj2+WgfWR7oPlx1Ouq1in6G4VLX+UyU2oS1nQrqjqF8CAGIFAyMG5em4+6juiFqaRII9ptBqLseZH/jOE4XjQw3+7WZBvbdAj8XuiQlGgiikDr9vUiujxy410x/RA6wv8TSPuKiz28ZE1VV9KerspSOyhC+HWq0TJYOylQA/2+Dkfg+76Rl7UnybRad5DZpP97m31xCSw1779d2dhDKPpT5WwgxZ01j2e7Ji/tFmkObxq5qg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Suqd5Xc5BazROHJB96adv6AL3gXb7qL3BCxweyTOpwLniXKkFhGyDqgsajmQFk8jcUAHaHMte4JFTMJqqh+V8v3bPWuNH5QOAHu2AeakJiUheuIPceQkB8iVTJFTjjEZ5OvnuTakyyrCiAfYJzxfRjcVvE6dqCxiQf0vR4502QzbIrBIB4zdgzkXd09mTcnm2iub6ifQo+RmB8BekiR5smzMnczFoKdIcKcwJvRG1EmjW4jH94hQvWIXi86lGxC0zG6IPzRGTo5MP263C+Y81q6dSh20M/0lyyIJ7rdF4iFh3Zn6Qmnl8PDsr1D8t554VIdEPowPjnE1dCs74hAL8Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Fri, 06 May 2022 11:16:27 +0000
  • Ironport-data: A9a23:zFCahq0KtaqT4dYtBvbD5alwkn2cJEfYwER7XKvMYLTBsI5bpzFTy 2IfX2uBb6rZa2v1fdFza4Tlph5X65fSm9YyTQZvpC1hF35El5HIVI+TRqvS04J+DSFhoGZPt Zh2hgzodZhsJpPkjk7xdOCn9xGQ7InQLlbGILes1htZGEk1EU/NtTo5w7Rj2tMw34Dga++wk YiaT/P3aQfNNwFcagr424rbwP+4lK2v0N+wlgVWicFj5DcypVFMZH4sDfjZw0/DaptVBoaHq 9Prl9lVyI97EyAFUbtJmp6jGqEDryW70QKm0hK6UID66vROS7BbPg/W+5PwZG8O4whlkeydx /12j621Cg0UGZTsp70tDhxWNTkkLK5/reqvzXiX6aR/zmXgWl60n7BLMxtzOocVvOFqHWtJ6 PoUbigXaQyOjP63x7T9TfRwgsMkL4/gO4Z3VnNIlGmFS6p5B8+YBfyVvre03x9p7ixKNezZa McDLyJmcTzLYgFVO0dRA5U79AutrianLWUC9ArMzUYxy2XR7FxD9+L0CfX2YtOLad1Hl23Cg luTqgwVBTlfbrRz0wGt8Hihm+vOliPTQ58JGfuz8fsCqE2ewCkfBQMbUXO/oOKlkQiuVtRHM UsW9yEy668o+ySDVtDgWzWorXjCuQQTM/JPF8Uq5QfLzbDbizt1HUABRz9FLdk57sk/QGR60 kfTxoyyQztyrLeSVHSRsK+Oqi+/MjQUKmlEYjIYSQwC4J/op4RbYg/zc+uP2ZWd1rXdcQwcC RjTxMTir93/VfI26pg=
  • Ironport-hdrordr: A9a23:HcI3f6kEzCL3HnQHVpmlsQT9FdfpDfOwimdD5ihNYBxZY6Wkfp +V8cjzhCWftN9OYhodcLC7V5Voj0mskKKdxbNhRYtKPTOWwVdASbsP0WKM+V3d8kHFh41gPO JbAtND4b7LfCRHZKTBkW6F+r8bqbHokZxAx92uqUuFJTsaFp2IhD0JbjpzfHcGJjWvUvECZe ChD4d81kydUEVSSv7+KmgOXuDFqdGOvJX6YSQeDxpizAWVlzun5JPzDhDdh34lInty6IZn1V KAvx3y562lvf3+4hjA11XL55ATvNf60NNMCOGFl8BQADTxjQSDYphnRtS5zXkIidDqzGxvvM jHoh8mMcg2w3TNflutqR+o4AXk2CZG0Q6W9XaoxV/Y5eDpTjMzDMRMwahDdAHC1kYmtNZglI pWwmOwrfNsfF/9tRW4w+KNewBhl0Kyr3Znu/UUlWZjXYwXb6IUhZAD/XlSDIwLEEvBmc0a+d FVfY/hDcttABKnhyizhBgu/DXsZAV4Iv6+eDlMhiTPuAIm30yQzCMjtb4idzk7hdAAoqJ/lp r525RT5c5zp/AtHNNA7cc6ML+K4z/2MGXx2Fz7GyWWKIg3f1TwlrXQ3JIZoMmXRb1g9upApH 2GaiISiVIP
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
> Page tables are used for two purposes after allocation: They either
> start out all empty, or they get filled to replace a superpage.
> Subsequently, to replace all empty or fully contiguous page tables,
> contiguous sub-regions will be recorded within individual page tables.
> Install the initial set of markers immediately after allocation. Make
> sure to retain these markers when further populating a page table in
> preparation for it to replace a superpage.
> 
> The markers are simply 4-bit fields holding the order value of
> contiguous entries. To demonstrate this, if a page table had just 16
> entries, this would be the initial (fully contiguous) set of markers:
> 
> index  0 1 2 3 4 5 6 7 8 9 A B C D E F
> marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
> 
> "Contiguous" here means not only present entries with successively
> increasing MFNs, each one suitably aligned for its slot, but also a
> respective number of all non-present entries.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>
> ---
> An alternative to the ASSERT()s added to set_iommu_ptes_present() would
> be to make the function less general-purpose; it's used in a single
> place only after all (i.e. it might as well be folded into its only
> caller).

I would think adding a comment that the function requires the PDE to
be empty would be good.  Also given the current usage we could drop
the nr_ptes parameter and just name the function fill_pde() or
similar.

> 
> While in VT-d's comment ahead of struct dma_pte I'm adjusting the
> description of the high bits, I'd like to note that the description of
> some of the lower bits isn't correct either. Yet I don't think adjusting
> that belongs here.
> ---
> v4: Add another comment referring to pt-contig-markers.h. Re-base.
> v3: Add comments. Re-base.
> v2: New.
> 
> --- a/xen/arch/x86/include/asm/iommu.h
> +++ b/xen/arch/x86/include/asm/iommu.h
> @@ -146,7 +146,8 @@ void iommu_free_domid(domid_t domid, uns
>  
>  int __must_check iommu_free_pgtables(struct domain *d);
>  struct domain_iommu;
> -struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd);
> +struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd,
> +                                                   uint64_t contig_mask);
>  void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg);
>  
>  #endif /* !__ARCH_X86_IOMMU_H__ */
> --- a/xen/drivers/passthrough/amd/iommu-defs.h
> +++ b/xen/drivers/passthrough/amd/iommu-defs.h
> @@ -446,11 +446,13 @@ union amd_iommu_x2apic_control {
>  #define IOMMU_PAGE_TABLE_U32_PER_ENTRY       (IOMMU_PAGE_TABLE_ENTRY_SIZE / 
> 4)
>  #define IOMMU_PAGE_TABLE_ALIGNMENT   4096
>  
> +#define IOMMU_PTE_CONTIG_MASK           0x1e /* The ign0 field below. */
> +
>  union amd_iommu_pte {
>      uint64_t raw;
>      struct {
>          bool pr:1;
> -        unsigned int ign0:4;
> +        unsigned int ign0:4; /* Covered by IOMMU_PTE_CONTIG_MASK. */
>          bool a:1;
>          bool d:1;
>          unsigned int ign1:2;
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
>  
>      while ( nr_ptes-- )
>      {
> -        set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> +        ASSERT(!pde->next_level);
> +        ASSERT(!pde->u);
> +
> +        if ( pde > table )
> +            ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> +        else
> +            ASSERT(pde->ign0 == PAGE_SHIFT - 3);

I think PAGETABLE_ORDER would be clearer here.

While here, could you also assert that next_mfn matches the contiguous
order currently set in the PTE?

> +
> +        pde->iw = iw;
> +        pde->ir = ir;
> +        pde->fc = true; /* See set_iommu_pde_present(). */
> +        pde->mfn = next_mfn;
> +        pde->pr = true;
>  
>          ++pde;
>          next_mfn += page_sz;
> @@ -295,7 +307,7 @@ static int iommu_pde_from_dfn(struct dom
>              mfn = next_table_mfn;
>  
>              /* allocate lower level page table */
> -            table = iommu_alloc_pgtable(hd);
> +            table = iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK);
>              if ( table == NULL )
>              {
>                  AMD_IOMMU_ERROR("cannot allocate I/O page table\n");
> @@ -325,7 +337,7 @@ static int iommu_pde_from_dfn(struct dom
>  
>              if ( next_table_mfn == 0 )
>              {
> -                table = iommu_alloc_pgtable(hd);
> +                table = iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK);
>                  if ( table == NULL )
>                  {
>                      AMD_IOMMU_ERROR("cannot allocate I/O page table\n");
> @@ -717,7 +729,7 @@ static int fill_qpt(union amd_iommu_pte
>                   * page table pages, and the resulting allocations are always
>                   * zeroed.
>                   */
> -                pgs[level] = iommu_alloc_pgtable(hd);
> +                pgs[level] = iommu_alloc_pgtable(hd, 0);

Is it worth not setting up the contiguous data for quarantine page
tables?

I think it's fine now given the current code, but you having added
ASSERTs that the contig data is correct in set_iommu_ptes_present()
makes me wonder whether we could trigger those in the future.

I understand that the contig data is not helpful for quarantine page
tables, but still doesn't seem bad to have it just for coherency.

>                  if ( !pgs[level] )
>                  {
>                      rc = -ENOMEM;
> @@ -775,7 +787,7 @@ int cf_check amd_iommu_quarantine_init(s
>          return 0;
>      }
>  
> -    pdev->arch.amd.root_table = iommu_alloc_pgtable(hd);
> +    pdev->arch.amd.root_table = iommu_alloc_pgtable(hd, 0);
>      if ( !pdev->arch.amd.root_table )
>          return -ENOMEM;
>  
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -342,7 +342,7 @@ int amd_iommu_alloc_root(struct domain *
>  
>      if ( unlikely(!hd->arch.amd.root_table) && d != dom_io )
>      {
> -        hd->arch.amd.root_table = iommu_alloc_pgtable(hd);
> +        hd->arch.amd.root_table = iommu_alloc_pgtable(hd, 0);
>          if ( !hd->arch.amd.root_table )
>              return -ENOMEM;
>      }
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -334,7 +334,7 @@ static uint64_t addr_to_dma_page_maddr(s
>              goto out;
>  
>          pte_maddr = level;
> -        if ( !(pg = iommu_alloc_pgtable(hd)) )
> +        if ( !(pg = iommu_alloc_pgtable(hd, 0)) )
>              goto out;
>  
>          hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
> @@ -376,7 +376,7 @@ static uint64_t addr_to_dma_page_maddr(s
>              }
>  
>              pte_maddr = level - 1;
> -            pg = iommu_alloc_pgtable(hd);
> +            pg = iommu_alloc_pgtable(hd, DMA_PTE_CONTIG_MASK);
>              if ( !pg )
>                  break;
>  
> @@ -388,12 +388,13 @@ static uint64_t addr_to_dma_page_maddr(s
>                  struct dma_pte *split = map_vtd_domain_page(pte_maddr);
>                  unsigned long inc = 1UL << level_to_offset_bits(level - 1);
>  
> -                split[0].val = pte->val;
> +                split[0].val |= pte->val & ~DMA_PTE_CONTIG_MASK;
>                  if ( inc == PAGE_SIZE )
>                      split[0].val &= ~DMA_PTE_SP;
>  
>                  for ( offset = 1; offset < PTE_NUM; ++offset )
> -                    split[offset].val = split[offset - 1].val + inc;
> +                    split[offset].val |=
> +                        (split[offset - 1].val & ~DMA_PTE_CONTIG_MASK) + inc;
>  
>                  iommu_sync_cache(split, PAGE_SIZE);
>                  unmap_vtd_domain_page(split);
> @@ -2173,7 +2174,7 @@ static int __must_check cf_check intel_i
>      if ( iommu_snoop )
>          dma_set_pte_snp(new);
>  
> -    if ( old.val == new.val )
> +    if ( !((old.val ^ new.val) & ~DMA_PTE_CONTIG_MASK) )
>      {
>          spin_unlock(&hd->arch.mapping_lock);
>          unmap_vtd_domain_page(page);
> @@ -3052,7 +3053,7 @@ static int fill_qpt(struct dma_pte *this
>                   * page table pages, and the resulting allocations are always
>                   * zeroed.
>                   */
> -                pgs[level] = iommu_alloc_pgtable(hd);
> +                pgs[level] = iommu_alloc_pgtable(hd, 0);
>                  if ( !pgs[level] )
>                  {
>                      rc = -ENOMEM;
> @@ -3109,7 +3110,7 @@ static int cf_check intel_iommu_quaranti
>      if ( !drhd )
>          return -ENODEV;
>  
> -    pg = iommu_alloc_pgtable(hd);
> +    pg = iommu_alloc_pgtable(hd, 0);
>      if ( !pg )
>          return -ENOMEM;
>  
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -253,7 +253,10 @@ struct context_entry {
>   * 2-6: reserved
>   * 7: super page
>   * 8-11: available
> - * 12-63: Host physcial address
> + * 12-51: Host physcial address
> + * 52-61: available (52-55 used for DMA_PTE_CONTIG_MASK)
> + * 62: reserved
> + * 63: available
>   */
>  struct dma_pte {
>      u64 val;
> @@ -263,6 +266,7 @@ struct dma_pte {
>  #define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE)
>  #define DMA_PTE_SP   (1 << 7)
>  #define DMA_PTE_SNP  (1 << 11)
> +#define DMA_PTE_CONTIG_MASK  (0xfull << PADDR_BITS)
>  #define dma_clear_pte(p)    do {(p).val = 0;} while(0)
>  #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0)
>  #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0)
> @@ -276,7 +280,7 @@ struct dma_pte {
>  #define dma_pte_write(p) (dma_pte_prot(p) & DMA_PTE_WRITE)
>  #define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K)
>  #define dma_set_pte_addr(p, addr) do {\
> -            (p).val |= ((addr) & PAGE_MASK_4K); } while (0)
> +            (p).val |= ((addr) & PADDR_MASK & PAGE_MASK_4K); } while (0)

While I'm not opposed to this, I would assume that addr is not
expected to contain bit cleared by PADDR_MASK? (or PAGE_MASK_4K FWIW)

Or else callers are really messed up.

>  #define dma_pte_present(p) (((p).val & DMA_PTE_PROT) != 0)
>  #define dma_pte_superpage(p) (((p).val & DMA_PTE_SP) != 0)
>  
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -522,11 +522,12 @@ int iommu_free_pgtables(struct domain *d
>      return 0;
>  }
>  
> -struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd)
> +struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
> +                                      uint64_t contig_mask)
>  {
>      unsigned int memflags = 0;
>      struct page_info *pg;
> -    void *p;
> +    uint64_t *p;
>  
>  #ifdef CONFIG_NUMA
>      if ( hd->node != NUMA_NO_NODE )
> @@ -538,7 +539,29 @@ struct page_info *iommu_alloc_pgtable(st
>          return NULL;
>  
>      p = __map_domain_page(pg);
> -    clear_page(p);
> +
> +    if ( contig_mask )
> +    {
> +        /* See pt-contig-markers.h for a description of the marker scheme. */
> +        unsigned int i, shift = find_first_set_bit(contig_mask);
> +
> +        ASSERT(((PAGE_SHIFT - 3) & (contig_mask >> shift)) == PAGE_SHIFT - 
> 3);

I think it might be clearer to use PAGETABLE_ORDER rather than
PAGE_SHIFT - 3.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.