[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 15/18] IOMMU/x86: prefill newly allocate page tables


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 14 Dec 2021 15:50:56 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Z5AwfmST6IEJAWwXeNpOUO0xK1o8sx/l3Wpo8ECsrhQ=; b=DEP++/jlI03qBoLXjQTDYgqi5VkuqaTzEehNIhRNJ61Gfno1GhJik/RVWYeAYnDfgXWZKpHepGnN5a3q8nNcQOoWmY5GE7AZeno5IXJvSo9yOp/zmsEdxcuM1qwcy3d2mctaNxz5nocOj+ygRKtE8JVkv9Ivf8QAF3y+BSuawKTs9fmDvMtiaFhukZFl1TKEbcrJxzKR/f8OePZPFVn2/fG7zxqDLq/i5nFzLNTmvgBbRYCiT0NhPMzXeUtbNb+C4TbvuDvFzhOCXZ7YmuqlZ1li7cHll4Brg/XnQT3WODYMn67AJ2dWedYFGHrs/s5UtXhrstfPXYMziD5YODUSnA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZjZrkjRatDlDohwGylIluDSOSoogiy+doVla7+zn8wNN+ZR5o7Ql+nI7JOBYXDYb6RkgnQveCVfwxRwvNOqfz7qGDo8vKu95hEatVyTk6AfvZO0DjZMIUbd3l39Z9ta4R7UChu0MB6r7LHNja7Uq2yxk8Xqov/SCMl8KvnN3VjWBTWBdSjOQ6vvkKiAYP7G8RzQbqmtBUqvzoXqZ61pJ3q1Vzeo+nX41hxpFrr4KVn0Ho82g5qyQBBoZF8kUjRZ7XDV224rdYoh2Mc8kT25bA+x4ALAnGcTMM1jeCkFRsVPxftNW/3atXfhNiUB1iy9YgCtDx/qpiKWZJoQ537vK2A==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Tue, 14 Dec 2021 14:51:20 +0000
  • Ironport-data: A9a23:CGACSqhSvDJATECFT0oqICKQX161lxcKZh0ujC45NGQN5FlHY01je htvD26HPquMZTT1KYhwa4/k9E5TscfdndFgHAtt/CA1F3gb9cadCdqndUqhZCn6wu8v7a5EA 2fyTvGacajYm1eF/k/F3oAMKRCQ7InQLlbGILes1htZGEk0GE/NtTo5w7Rg29Qx3YDja++wk YiaT/P3aQfNNwFcagr424rbwP+4lK2v0N+wlgVWicFj5DcypVFMZH4sDfjZw0/DaptVBoaHq 9Prl9lVyI97EyAFUbtJmp6jGqEDryW70QKm0hK6UID66vROS7BbPg/W+5PwZG8O4whlkeydx /0W6oG2ZxcvLJfyxugSUAcICy9FHqBvreqvzXiX6aR/zmXDenrohf5vEFs3LcsT/eMf7WNmr KJCbmpXN1ba2rzwkOnTpupE36zPKOHxO4wSoDd4xCzxBvc6W5HTBa7N4Le02R9t15wfRaaEO KL1bxIodEzAO00VFGszJ4NhwOOK2GjmUR9x/Qf9Sa0fvDGIkV0ZPKLWGMLcZ9iiVchT2EGCq Qru/GnnAxdcKN2WzxKE9G6hgqnEmiaTcIAYGaC89/VqqEaO3WFVAxoTPXOrrP/8hkOgVtZ3L 00P5jFovaU07FasTNT2Q1u/unHsg/IHc4MOSatgsljLk/eKpVbCboQZctJfQId5qckYVTo47 GHXk9zqG2x3voa2RFvIo994sgiOESQSKGYDYwoNQg0E/8TvrekPs/7fcjpwOPXr14OoQFkc1 xjP9XFj3OtL0abnwo3ipQif6w9AsKQlWeLcCu//emu+pj10a4e+D2BDwQiKtK0QRGp1o7Tog ZTlpyR8xLxeZX1uvHbUKAnoIF1Pz6zaWNE7qQQwd6TNDxz3pxaekXl4uVmS3ntBPMceYiPOa 0TOow5X75I7FCL0Nv4rOdvpVpp1nPiI+THZuhb8NIUmjn9ZLlDvwc2TTRTIgzCFfLYEz8nTx qt3ge7zVC1HWMyLPRK9RvsH0K9D+8zN7Ti7eHwP9Dz+ieD2TCfMEd8taQLSBshkvfvsiFiEq L53aprVoyizpcWjO0E7B6ZIdgtURZX6bLirw/FqmhmrflA7RTp/UqCJmtvMueVNxsxoqwsBx VnkMmdww1vjn3zXbwKMb3FocrT0Wphj63k8OEQR0ZyAgRDPuK6js/UScYUZZ74i+LAxxPJ4V aBdKc6BHu5OWnLM/DFENcvxq4lrdRKKgwOSPnX6PGhjLsA4HwGZqMX5egbP9TUVCnblv8UJv LD9hBjQRoAORlo+AZ+OOu6v1V64oVMUhPl2AxnTOtBWdUi1qNpqJiX9g+UZOcYJLRmflDKW2 xzPWUUTpPXXop9z+97M3PjWo4CsGup4P0xbA2iEsurmaXiEpjKumNYSXvyJcDbRUHLP1J+jP egFnevhNPAnnUpRt9YuGbhc0q9jtcDkoKVXz1o4ESyTPUirEL5pPlKPwdJL6v9W3rZctAa7B hCP991dNenbMc/pCgdMdg8sb+DF3vAIgDjCq/8yJRyitiNw+bOGV2RUPgWN13MBfOckbtt9z LdzotMS5iy+lgEuY4SPgS1j/miRKmANDvc8vZYADY630gcmxzmuu3AH5vMaNH1XV+hxDw==
  • Ironport-hdrordr: A9a23:LCnRpq0/8ejosw0IjEp4jQqjBRpyeYIsimQD101hICG9Lfbo8v xGzc5rtyMc1gxhO03IwerwSJVohEmsgaKc4eEqTM+ftXrdyRiVxeBZhrcKrAeQZxEWmtQts5 uINpIOeeEYbmIKw/oSgjPIbOrIqePvmMrE6Ya/vhMdKj2Gc5sP0+46MHfkLqQffngFOXNTLu vm2iMznUvcRZ1hVLXAOpBqZZm7m/T70LjhbBI6GRhizAWVlzun5J7WeiLonis2Yndkx7ovzH bCqhf+7Km4qf23oyWslFM7264m3ecJh+EzXvBlRaAuW3rRozftQL4kd6yJvTgzru3qwFE2kO PUqxNlBMh342O5RBD+nfO4sTOO7B8er1vZjXOIi3rqpsL0ABggDdBauI5fehzFr2I9odBVys twrjGknqsSKSmFsDX25tDOWR0vvFGzu2AenekaiGEaeZcCaYVWsZcU8CpuYcc99RrBmdIa+d RVfZ/hDbdtAAynhknizzVSKQmXLyUO9hTveDlIhiXa6UkW7SVEJ41x/r1aop5KzuNOd3B+3Z WyDkyz/Is+CPP+XZgNQ9vpCfHHf1AlYSi8aF56cm6XT53uzRr22tLKCItc3pDiRHT+pKFC1a gpFmko6FIaagbnBMrL25tQ6xjIBH6wUDT3xstCjqIJ44EUdYCbeRFrYGpe5fednw==
  • Ironport-sdr: UkQmCFYnl6IiGcfpPtWcVS1jJ2WAswybIn7TyCydSUt9YM+FEXgLqIjfDxgY4aS+rrfG1kxq59 TusqESE1uegOKCzxXWElSjzJbwQA9OslvnkqToI7+ki31TaIDgRWLaWNSAWsLDb7rqjzbC+a9+ l4q3TRtrgogSI0ATMj7XNwA953JCINmbF1uvDNxC37kLbA1prtg9v7W13h1ECysxDeu8qUQoYq Dzi3x+RNdwhWZ0fgJPtDcrDGfgWAaZ3fUuN3sN5jt1CgFOb8ACXSS55CIVB71isSzf4MpUD51U aN2iYPprfYAHJx66DTa5pvZj
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Sep 24, 2021 at 11:54:58AM +0200, Jan Beulich wrote:
> Page table are used for two purposes after allocation: They either start
> out all empty, or they get filled to replace a superpage. Subsequently,
> to replace all empty or fully contiguous page tables, contiguous sub-
> regions will be recorded within individual page tables. Install the
> initial set of markers immediately after allocation. Make sure to retain
> these markers when further populating a page table in preparation for it
> to replace a superpage.
> 
> The markers are simply 4-bit fields holding the order value of
> contiguous entries. To demonstrate this, if a page table had just 16
> entries, this would be the initial (fully contiguous) set of markers:
> 
> index  0 1 2 3 4 5 6 7 8 9 A B C D E F
> marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
> 
> "Contiguous" here means not only present entries with successively
> increasing MFNs, each one suitably aligned for its slot, but also a
> respective number of all non-present entries.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Obviously this marker only works for newly created page tables right
now, the moment we start poking holes or replacing entries the marker
is not updated anymore. I expect further patches will expand on
this.

> ---
> An alternative to the ASSERT()s added to set_iommu_ptes_present() would
> be to make the function less general-purpose; it's used in a single
> place only after all (i.e. it might as well be folded into its only
> caller).
> ---
> v2: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu-defs.h
> +++ b/xen/drivers/passthrough/amd/iommu-defs.h
> @@ -445,6 +445,8 @@ union amd_iommu_x2apic_control {
>  #define IOMMU_PAGE_TABLE_U32_PER_ENTRY       (IOMMU_PAGE_TABLE_ENTRY_SIZE / 
> 4)
>  #define IOMMU_PAGE_TABLE_ALIGNMENT   4096
>  
> +#define IOMMU_PTE_CONTIG_MASK           0x1e /* The ign0 field below. */

Should you rename ign0 to contig_mask or some such now?

Same would apply to the comment next to dma_pte for VT-d, where bits
52:62 are ignored (the comments seems to be missing this already) and
we will be using bits 52:55 to store the contiguous mask for the
entry.

> +
>  union amd_iommu_pte {
>      uint64_t raw;
>      struct {
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -116,7 +116,19 @@ static void set_iommu_ptes_present(unsig
>  
>      while ( nr_ptes-- )
>      {
> -        set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> +        ASSERT(!pde->next_level);
> +        ASSERT(!pde->u);
> +
> +        if ( pde > table )
> +            ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> +        else
> +            ASSERT(pde->ign0 == PAGE_SHIFT - 3);

You could even special case (pde - table) % 2 != 0, but this is debug
only code, and it's possible a mod is more costly than
find_first_set_bit.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -433,12 +433,12 @@ int iommu_free_pgtables(struct domain *d
>      return 0;
>  }
>  
> -struct page_info *iommu_alloc_pgtable(struct domain *d)
> +struct page_info *iommu_alloc_pgtable(struct domain *d, uint64_t contig_mask)
>  {
>      struct domain_iommu *hd = dom_iommu(d);
>      unsigned int memflags = 0;
>      struct page_info *pg;
> -    void *p;
> +    uint64_t *p;
>  
>  #ifdef CONFIG_NUMA
>      if ( hd->node != NUMA_NO_NODE )
> @@ -450,7 +450,28 @@ struct page_info *iommu_alloc_pgtable(st
>          return NULL;
>  
>      p = __map_domain_page(pg);
> -    clear_page(p);
> +
> +    if ( contig_mask )
> +    {
> +        unsigned int i, shift = find_first_set_bit(contig_mask);
> +
> +        ASSERT(((PAGE_SHIFT - 3) & (contig_mask >> shift)) == PAGE_SHIFT - 
> 3);
> +
> +        p[0] = (PAGE_SHIFT - 3ull) << shift;
> +        p[1] = 0;
> +        p[2] = 1ull << shift;
> +        p[3] = 0;
> +
> +        for ( i = 4; i < PAGE_SIZE / 8; i += 4 )
> +        {
> +            p[i + 0] = (find_first_set_bit(i) + 0ull) << shift;
> +            p[i + 1] = 0;
> +            p[i + 2] = 1ull << shift;
> +            p[i + 3] = 0;
> +        }

You could likely do:

for ( i = 0; i < PAGE_SIZE / 8; i += 4 )
{
    p[i + 0] = i ? ((find_first_set_bit(i) + 0ull) << shift)
                 : ((PAGE_SHIFT - 3ull) << shift);
    p[i + 1] = 0;
    p[i + 2] = 1ull << shift;
    p[i + 3] = 0;
}

To avoid having to open code the first loop iteration. The ternary
operator could also be nested before the shift, but I find that
harder to read.

> +    }
> +    else
> +        clear_page(p);
>  
>      if ( hd->platform_ops->sync_cache )
>          iommu_vcall(hd->platform_ops, sync_cache, p, PAGE_SIZE);
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -142,7 +142,8 @@ int pi_update_irte(const struct pi_desc
>  })
>  
>  int __must_check iommu_free_pgtables(struct domain *d);
> -struct page_info *__must_check iommu_alloc_pgtable(struct domain *d);
> +struct page_info *__must_check iommu_alloc_pgtable(struct domain *d,
> +                                                   uint64_t contig_mask);
>  void iommu_queue_free_pgtable(struct domain *d, struct page_info *pg);
>  
>  #endif /* !__ARCH_X86_IOMMU_H__ */
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.