[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 12/20] xen/riscv: implement p2m_set_range()


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 5 Aug 2025 18:04:59 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 05 Aug 2025 16:05:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 31.07.2025 17:58, Oleksii Kurochko wrote:
> This patch introduces p2m_set_range() and its core helper p2m_set_entry() for

Nit: This patch doesn't introduce p2m_set_range(); it merely fleshes it out.

> --- a/xen/arch/riscv/include/asm/p2m.h
> +++ b/xen/arch/riscv/include/asm/p2m.h
> @@ -7,11 +7,13 @@
>  #include <xen/rwlock.h>
>  #include <xen/types.h>
>  
> +#include <asm/page.h>
>  #include <asm/page-bits.h>
>  
>  extern unsigned int p2m_root_order;
>  #define P2M_ROOT_ORDER  p2m_root_order
>  #define P2M_ROOT_PAGES  BIT(P2M_ROOT_ORDER, U)
> +#define P2M_ROOT_LEVEL  HYP_PT_ROOT_LEVEL

I think I commented on this before, and I would have hoped for at least a remark
in the description to appear (perhaps even a comment here): It's okay(ish) to 
tie
these together for now, but in the longer run I don't expect this is going to be
wanted. If e.g. we ran Xen in Sv57 mode, there would be no reason at all to 
force
all P2Ms to use 5 levels of page tables.

> @@ -175,13 +179,257 @@ int p2m_set_allocation(struct domain *d, unsigned long 
> pages, bool *preempted)
>      return 0;
>  }
>  
> +/*
> + * Find and map the root page table. The caller is responsible for
> + * unmapping the table.
> + *
> + * The function will return NULL if the offset into the root table is
> + * invalid.
> + */
> +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
> +{
> +    unsigned long root_table_indx;
> +
> +    root_table_indx = gfn_x(gfn) >> XEN_PT_LEVEL_ORDER(P2M_ROOT_LEVEL);

Right now page table layouts / arrangements are indeed similar enough to
share accessor constructs. Nevertheless I find it problematic (doc-wise
at the very least) that a Xen page table construct is used to access a
P2M page table. If and when these needed to be decoupled, it would likely
help of the distinction was already made, by - for now - simply
introducing aliases (here e.g. P2M_LEVEL_ORDER(), expanding to
XEN_PT_LEVEL_ORDER() for the time being).

> +    if ( root_table_indx >= P2M_ROOT_PAGES )
> +        return NULL;
> +
> +    return __map_domain_page(p2m->root + root_table_indx);
> +}
> +
> +static inline void p2m_write_pte(pte_t *p, pte_t pte, bool clean_pte)
> +{
> +    write_pte(p, pte);
> +    if ( clean_pte )
> +        clean_dcache_va_range(p, sizeof(*p));

Not necessarily for right away, but if multiple adjacent PTEs are
written without releasing the lock, this then redundant cache flushing
can be a performance issue.

> +}
> +
> +static inline void p2m_clean_pte(pte_t *p, bool clean_pte)
> +{
> +    pte_t pte;
> +
> +    memset(&pte, 0, sizeof(pte));

Why memset()? Why not simply give the variable an appropriate initializer?
Or use ...

> +    p2m_write_pte(p, pte, clean_pte);

... a compound literal here, like you do ...

> +}
> +
> +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t)
> +{
> +    panic("%s: hasn't been implemented yet\n", __func__);
> +
> +    return (pte_t) { .pte = 0 };

... here? (Just {} would also do, if I'm not mistaken.)

> +}
> +
> +#define P2M_TABLE_MAP_NONE 0
> +#define P2M_TABLE_MAP_NOMEM 1
> +#define P2M_TABLE_SUPER_PAGE 2
> +#define P2M_TABLE_NORMAL 3
> +
> +/*
> + * Take the currently mapped table, find the corresponding the entry
> + * corresponding to the GFN, and map the next table, if available.

Nit: Double "corresponding".

> + * The previous table will be unmapped if the next level was mapped
> + * (e.g P2M_TABLE_NORMAL returned).
> + *
> + * `alloc_tbl` parameter indicates whether intermediate tables should
> + * be allocated when not present.
> + *
> + * Return values:
> + *  P2M_TABLE_MAP_NONE: a table allocation isn't permitted.
> + *  P2M_TABLE_MAP_NOMEM: allocating a new page failed.
> + *  P2M_TABLE_SUPER_PAGE: next level or leaf mapped normally.
> + *  P2M_TABLE_NORMAL: The next entry points to a superpage.
> + */
> +static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl,
> +                          unsigned int level, pte_t **table,
> +                          unsigned int offset)
> +{
> +    panic("%s: hasn't been implemented yet\n", __func__);
> +
> +    return P2M_TABLE_MAP_NONE;
> +}
> +
> +/* Free pte sub-tree behind an entry */
> +static void p2m_free_subtree(struct p2m_domain *p2m,
> +                             pte_t entry, unsigned int level)
> +{
> +    panic("%s: hasn't been implemented yet\n", __func__);
> +}
> +
> +/*
> + * Insert an entry in the p2m. This should be called with a mapping
> + * equal to a page/superpage.
> + */
> +static int p2m_set_entry(struct p2m_domain *p2m,
> +                           gfn_t gfn,
> +                           unsigned long page_order,
> +                           mfn_t mfn,
> +                           p2m_type_t t)
> +{
> +    unsigned int level;
> +    unsigned int target = page_order / PAGETABLE_ORDER;
> +    pte_t *entry, *table, orig_pte;
> +    int rc;
> +    /* A mapping is removed if the MFN is invalid. */
> +    bool removing_mapping = mfn_eq(mfn, INVALID_MFN);

Comment and code don't fit together. Many MFNs are invalid (any for which
mfn_valid() returns false), yet you only check for INVALID_MFN here.

> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
> +
> +    ASSERT(p2m_is_write_locked(p2m));
> +
> +    /*
> +     * Check if the level target is valid: we only support
> +     * 4K - 2M - 1G mapping.
> +     */
> +    ASSERT((target <= 2) && !(page_order % PAGETABLE_ORDER));

If you think you need to check this, don't you also want to check that
GFN and MFN (the latter if it isn't INVALID_MFN) fit the requested order?

> +    table = p2m_get_root_pointer(p2m, gfn);
> +    if ( !table )
> +        return -EINVAL;
> +
> +    for ( level = P2M_ROOT_LEVEL; level > target; level-- )
> +    {
> +        /*
> +         * Don't try to allocate intermediate page table if the mapping
> +         * is about to be removed.
> +         */
> +        rc = p2m_next_level(p2m, !removing_mapping,
> +                            level, &table, offsets[level]);
> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
> +        {
> +            rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
> +            /*
> +             * We are here because p2m_next_level has failed to map
> +             * the intermediate page table (e.g the table does not exist
> +             * and they p2m tree is read-only). It is a valid case
> +             * when removing a mapping as it may not exist in the
> +             * page table. In this case, just ignore it.
> +             */
> +            rc = removing_mapping ?  0 : rc;

Nit: Stray blank.

> +            goto out;
> +        }
> +
> +        if ( rc != P2M_TABLE_NORMAL )
> +            break;
> +    }
> +
> +    entry = table + offsets[level];
> +
> +    /*
> +     * If we are here with level > target, we must be at a leaf node,
> +     * and we need to break up the superpage.
> +     */
> +    if ( level > target )
> +    {
> +        panic("Shattering isn't implemented\n");
> +    }
> +
> +    /*
> +     * We should always be there with the correct level because all the
> +     * intermediate tables have been installed if necessary.
> +     */
> +    ASSERT(level == target);
> +
> +    orig_pte = *entry;
> +
> +    if ( removing_mapping )
> +        p2m_clean_pte(entry, p2m->clean_pte);
> +    else
> +    {
> +        pte_t pte = p2m_pte_from_mfn(mfn, t);
> +
> +        p2m_write_pte(entry, pte, p2m->clean_pte);
> +
> +        p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
> +                                      gfn_add(gfn, BIT(page_order, UL) - 1));
> +        p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, gfn);
> +    }
> +
> +    p2m->need_flush = true;
> +
> +    /*
> +     * Currently, the infrastructure required to enable 
> CONFIG_HAS_PASSTHROUGH
> +     * is not ready for RISC-V support.
> +     *
> +     * When CONFIG_HAS_PASSTHROUGH=y, iommu_iotlb_flush() should be done
> +     * here.
> +     */
> +#ifdef CONFIG_HAS_PASSTHROUGH
> +#   error "add code to flush IOMMU TLB"
> +#endif
> +
> +    rc = 0;
> +
> +    /*
> +     * Free the entry only if the original pte was valid and the base
> +     * is different (to avoid freeing when permission is changed).
> +     */
> +    if ( pte_is_valid(orig_pte) &&
> +         !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) )

I'm puzzled by this 2nd check: A permission change would - I expect - only
occur to a leaf entry. If the new entry is a super-page one and the old
wasn't, don't you still need to free the sub-tree, no matter whether the
MFNs are the same? Plus consider the special case of MFN 0: If you clear
an entry using MFN 0, you will find old and new PTEs' both having the same
MFN.

>  static int p2m_set_range(struct p2m_domain *p2m,
>                           gfn_t sgfn,
>                           unsigned long nr,
>                           mfn_t smfn,
>                           p2m_type_t t)
>  {
> -    return -EOPNOTSUPP;
> +    int rc = 0;
> +    unsigned long left = nr;
> +
> +    /*
> +     * Any reference taken by the P2M mappings (e.g. foreign mapping) will
> +     * be dropped in relinquish_p2m_mapping(). As the P2M will still
> +     * be accessible after, we need to prevent mapping to be added when the
> +     * domain is dying.
> +     */
> +    if ( unlikely(p2m->domain->is_dying) )
> +        return -EACCES;
> +
> +    while ( left )
> +    {
> +        unsigned long order = p2m_mapping_order(sgfn, smfn, left);
> +
> +        rc = p2m_set_entry(p2m, sgfn, order, smfn, t);
> +        if ( rc )
> +            break;
> +
> +        sgfn = gfn_add(sgfn, BIT(order, UL));
> +        if ( !mfn_eq(smfn, INVALID_MFN) )
> +           smfn = mfn_add(smfn, BIT(order, UL));
> +
> +        left -= BIT(order, UL);
> +    }
> +
> +    return !left ? 0 : left == nr ? rc : (nr - left);

The function returning "int", you may be truncating the return value here.
In the worst case indicating success (0) or an error (negative) when some
of the upper bits were set.

Also looks like you could get away with a single conditional operator here:

    return !left || left == nr ? rc : (nr - left);

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.