[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 20/20] xen/riscv: introduce metadata table to store P2M type


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 11 Aug 2025 17:44:24 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 11 Aug 2025 15:44:45 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 31.07.2025 17:58, Oleksii Kurochko wrote:
> RISC-V's PTE has only two available bits that can be used to store the P2M
> type. This is insufficient to represent all the current RISC-V P2M types.
> Therefore, some P2M types must be stored outside the PTE bits.
> 
> To address this, a metadata table is introduced to store P2M types that
> cannot fit in the PTE itself. Not all P2M types are stored in the
> metadata table—only those that require it.
> 
> The metadata table is linked to the intermediate page table via the
> `struct page_info`'s list field of the corresponding intermediate page.
> 
> To simplify the allocation and linking of intermediate and metadata page
> tables, `p2m_{alloc,free}_table()` functions are implemented.
> 
> These changes impact `p2m_split_superpage()`, since when a superpage is
> split, it is necessary to update the metadata table of the new
> intermediate page table — if the entry being split has its P2M type set
> to `p2m_ext_storage` in its `P2M_TYPES` bits.

Oh, this was an aspect I didn't realize when commenting on the name of
the enumerator. I think you want to keep the name for the purpose here,
but you better wouldn't apply relational operators to it (and hence
have a second value to serve that purpose).

> In addition to updating
> the metadata of the new intermediate page table, the corresponding entry
> in the metadata for the original superpage is invalidated.
> 
> Also, update p2m_{get,set}_type to work with P2M types which don't fit
> into PTE bits.
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

No Suggested-by: or anything?

> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -150,6 +150,15 @@ struct page_info
>              /* Order-size of the free chunk this page is the head of. */
>              unsigned int order;
>          } free;
> +
> +        /* Page is used to store metadata: p2m type. */

That's not correct. The page thus described is what the pointer below
points to. Here it's more like "Page is used as an intermediate P2M
page table".

> +        struct {
> +            /*
> +             * Pointer to a page which store metadata for an intermediate 
> page
> +             * table.
> +             */
> +            struct page_info *metadata;
> +        } md;

In the description you say you would re-use the list field.

> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -101,7 +101,16 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
>  {
>      struct domain *d = p2m->domain;
>      struct page_info *page;
> -    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
> +    /*
> +     * If the root page table starts at Level <= 2, and since only 1GB, 2MB,
> +     * and 4KB mappings are supported (as enforced by the ASSERT() in
> +     * p2m_set_entry()), it is necessary to allocate P2M_ROOT_PAGES for
> +     * the root page table itself, plus an additional P2M_ROOT_PAGES for
> +     * metadata storage. This is because only two free bits are available in
> +     * the PTE, which are not sufficient to represent all possible P2M types.
> +     */
> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES *
> +                                       ((P2M_ROOT_LEVEL <= 2) ? 2 : 1);
>  
>      /*
>       * Return back nr_root_pages to assure the root table memory is also
> @@ -114,6 +123,23 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
>      if ( !page )
>          return -ENOMEM;
>  
> +    if ( P2M_ROOT_LEVEL <= 2 )
> +    {
> +        /*
> +         * In the case where P2M_ROOT_LEVEL <= 2, it is necessary to allocate
> +         * a page of the same size as that used for the root page table.
> +         * Therefore, p2m_allocate_root() can be safely reused.
> +         */
> +        struct page_info *metadata = p2m_allocate_root(d);
> +        if ( !metadata )
> +        {
> +            free_domheap_pages(page, P2M_ROOT_ORDER);
> +            return -ENOMEM;
> +        }
> +
> +        page->v.md.metadata = metadata;

Don't you need to install such a link for every one of the 4 pages?

> @@ -198,24 +224,25 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain 
> *p2m, gfn_t gfn)
>      return __map_domain_page(p2m->root + root_table_indx);
>  }
>  
> -static int p2m_set_type(pte_t *pte, p2m_type_t t)
> +static void p2m_set_type(pte_t *pte, const p2m_type_t t, const unsigned int 
> i)
>  {
> -    int rc = 0;
> -
>      if ( t > p2m_ext_storage )
> -        panic("unimplemeted\n");
> +    {
> +        ASSERT(pte);
> +
> +        pte[i].pte = t;

What does i identify here?

> +    }
>      else
>          pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
> -
> -    return rc;
>  }
>  
> -static p2m_type_t p2m_get_type(const pte_t pte)
> +static p2m_type_t p2m_get_type(const pte_t pte, const pte_t *metadata,
> +                               const unsigned int i)
>  {
>      p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
>  
>      if ( type == p2m_ext_storage )
> -        panic("unimplemented\n");
> +        type = metadata[i].pte;
>  
>      return type;
>  }

Overall this feels pretty fragile, as the caller has to pass several values
which all need to be in sync with one another. If you ...

> @@ -265,7 +292,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
>      }
>  }
>  
> -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
> +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t,
> +                              struct page_info *metadata_pg,
> +                              const unsigned int indx,
> +                              bool is_table)
>  {
>      pte_t e = (pte_t) { PTE_VALID };
>  
> @@ -285,12 +315,21 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, 
> bool is_table)
>  
>      if ( !is_table )
>      {
> +        pte_t *metadata = __map_domain_page(metadata_pg);

... map the page anyway, no matter whether ...

>          p2m_set_permission(&e, t);
>  
> +        metadata[indx].pte = p2m_invalid;
> +
>          if ( t < p2m_ext_storage )
> -            p2m_set_type(&e, t);
> +            p2m_set_type(&e, t, indx);
>          else
> -            panic("unimplemeted\n");
> +        {
> +            e.pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
> +            p2m_set_type(metadata, t, indx);
> +        }

... you'll actually use it, maybe best to map both pages at the same point?

And as said elsewhere, no, I don't think you want to use p2m_set_type() for
two entirely different purposes.

> @@ -323,22 +364,71 @@ static struct page_info *p2m_alloc_page(struct 
> p2m_domain *p2m)
>      return pg;
>  }
>  
> +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
> +
> +/*
> + * Allocate a page table with an additional extra page to store
> + * metadata for each entry of the page table.
> + * Link this metadata page to page table page's list field.
> + */
> +static struct page_info * p2m_alloc_table(struct p2m_domain *p2m)

Nit: Stray blank after * again.

> +{
> +    enum table_type
> +    {
> +        INTERMEDIATE_TABLE=0,

If you really think you need the "= 0", then please with blanks around '='.

> +        /*
> +         * At the moment, metadata is going to store P2M type
> +         * for each PTE of page table.
> +         */
> +        METADATA_TABLE,
> +        TABLE_MAX
> +    };
> +
> +    struct page_info *tables[TABLE_MAX];
> +
> +    for ( unsigned int i = 0; i < TABLE_MAX; i++ )
> +    {
> +        tables[i] = p2m_alloc_page(p2m);
> +
> +        if ( !tables[i] )
> +            goto out;
> +
> +        clear_and_clean_page(tables[i]);
> +    }
> +
> +    tables[INTERMEDIATE_TABLE]->v.md.metadata = tables[METADATA_TABLE];
> +
> +    return tables[INTERMEDIATE_TABLE];
> +
> + out:
> +    for ( unsigned int i = 0; i < TABLE_MAX; i++ )
> +        if ( tables[i] )

You didn't clear all of tables[] first, though. This kind of cleanup is
often better done as

    while ( i-- > 0 )
        ...

You don't even need an if() then, as you know allocations succeeded for all
earlier array slots.

> +            p2m_free_page(p2m, tables[i]);
> +
> +    return NULL;
> +}

I'm also surprised you allocate the metadata table no matter whether you'll
actually need it. That'll double your average paging pool usage, when in a
typical case only very few entries would actually require this extra
storage.

> @@ -453,10 +543,9 @@ static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t 
> type)
>  }
>  
>  /* Put any references on the page referenced by pte. */
> -static void p2m_put_page(const pte_t pte, unsigned int level)
> +static void p2m_put_page(const pte_t pte, unsigned int level, p2m_type_t 
> p2mt)
>  {
>      mfn_t mfn = pte_get_mfn(pte);
> -    p2m_type_t p2m_type = p2m_get_type(pte);
>  
>      ASSERT(pte_is_valid(pte));
>  
> @@ -470,10 +559,10 @@ static void p2m_put_page(const pte_t pte, unsigned int 
> level)
>      switch ( level )
>      {
>      case 1:
> -        return p2m_put_2m_superpage(mfn, p2m_type);
> +        return p2m_put_2m_superpage(mfn, p2mt);
>  
>      case 0:
> -        return p2m_put_4k_page(mfn, p2m_type);
> +        return p2m_put_4k_page(mfn, p2mt);
>      }
>  }

Might it be better to introduce this function in this shape right away, in
the earlier patch?

> @@ -690,18 +791,23 @@ static int p2m_set_entry(struct p2m_domain *p2m,
>      {
>          /* We need to split the original page. */
>          pte_t split_pte = *entry;
> +        struct page_info *metadata = virt_to_page(table)->v.md.metadata;

This (or along these lines) is how I would have expected things to be done
elsewhere as well, limiting the amount of arguments you need to pass
around.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.