[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 10/18] xen/riscv: implement p2m_set_range()
- To: Jan Beulich <jbeulich@xxxxxxxx>
- From: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
- Date: Thu, 25 Sep 2025 22:08:52 +0200
- Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
- Delivery-date: Thu, 25 Sep 2025 20:09:04 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On 9/20/25 1:36 AM, Jan Beulich wrote:
On 17.09.2025 23:55, Oleksii Kurochko wrote:
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -16,6 +16,7 @@
#include <asm/riscv_encoding.h>
unsigned long __ro_after_init gstage_mode;
+unsigned int __ro_after_init gstage_root_level;
void __init gstage_mode_detect(void)
{
@@ -53,6 +54,7 @@ void __init gstage_mode_detect(void)
if ( MASK_EXTR(csr_read(CSR_HGATP), HGATP_MODE_MASK) == mode )
{
gstage_mode = mode;
+ gstage_root_level = modes[mode_idx].paging_levels - 1;
break;
}
}
@@ -210,6 +212,9 @@ int p2m_init(struct domain *d)
rwlock_init(&p2m->lock);
INIT_PAGE_LIST_HEAD(&p2m->pages);
+ p2m->max_mapped_gfn = _gfn(0);
+ p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
+
/*
* Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH
* is not ready for RISC-V support.
@@ -251,13 +256,287 @@ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
return rc;
}
+/*
+ * Find and map the root page table. The caller is responsible for
+ * unmapping the table.
With the root table being 4 pages, "the root table" is slightly misleading
here: Yu never map the entire table.
I will update the comment then to:
/*
* Map one of the four root pages of the P2M root page table.
*
* The P2M root page table is larger than normal (16KB instead of 4KB),
* so it is allocated as four consecutive 4KB pages. This function selects
* the appropriate 4KB page based on the given GFN and returns a mapping
* to it.
*
* The caller is responsible for unmapping the page after use.
*
* Returns NULL if the calculated offset into the root table is invalid.
*/
+ * The function will return NULL if the offset into the root table is
+ * invalid.
+ */
+static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
+{
+ unsigned long root_table_indx;
+
+ root_table_indx = gfn_x(gfn) >> P2M_LEVEL_ORDER(P2M_ROOT_LEVEL);
+ if ( root_table_indx >= P2M_ROOT_PAGES )
+ return NULL;
+
+ /*
+ * The P2M root page table is extended by 2 bits, making its size 16KB
+ * (instead of 4KB for non-root page tables). Therefore, p2m->root is
+ * allocated as four consecutive 4KB pages (since alloc_domheap_pages()
+ * only allocates 4KB pages).
+ *
+ * To determine which of these four 4KB pages the root_table_indx falls
+ * into, we divide root_table_indx by
+ * P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1).
+ */
+ root_table_indx /= P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1);
The subtraction of 1 here feels odd: You're after the root table's
number of entries, i.e. I'd expect you to pass just P2M_ROOT_LEVEL.
And the way P2M_PAGETABLE_ENTRIES() works also suggests so.
The purpose of this line is to select the page within the root table, which
consists of 4 consecutive pages. However, P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL)
returns 2048, so root_table_idx will always be 0 after devision, which is not
what we want.
As an alternative, P2M_PAGETABLE_ENTRIES(0) could be used, since it always
returns 512. Dividing root_table_idx by 512 then yields the index of the page
within the root table, which is made up of 4 consecutive pages.
Does it make sense now?
The problem may occur with DECLARE_OFFSET(), which can produce an incorrect
index within the root page table. Since the index is in the range [0, 2047],
it becomes an issue if the value is greater than 511, because DECLARE_OFFSET()
does not account for the larger range of the root index.
I am not sure whether it is better to make DECLARE_OFFSET() generic enough
for both P2M and Xen page tables, or to provide a separate P2M_DECLARE_OFFSET()
and use it only in P2M-related code.
Also, it could be an option to move DECLARE_OFFSET() from asm/page.h header
to riscv/pt.c and define another one DECLARE_OFFSETS in riscv/p2m.c.
Do you have a preference?
+/*
+ * Insert an entry in the p2m. This should be called with a mapping
+ * equal to a page/superpage.
+ */
I don't follow this comment: There isn't any mapping being passed in, is there?
I think this comment should be dropped, it was about that requested mapping
should be equal to a page/superpage(4k, 2m, 1g), the correct order is always
guaranteed by p2m_mapping_order().
+static int p2m_set_entry(struct p2m_domain *p2m,
+ gfn_t gfn,
+ unsigned long page_order,
+ mfn_t mfn,
+ p2m_type_t t)
Nit: Indentation.
+{
+ unsigned int level;
+ unsigned int target = page_order / PAGETABLE_ORDER;
+ pte_t *entry, *table, orig_pte;
+ int rc;
+ /*
+ * A mapping is removed only if the MFN is explicitly set to INVALID_MFN.
+ * Other MFNs that are considered invalid by mfn_valid() (e.g., MMIO)
+ * are still allowed.
+ */
+ bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
+ DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
+
+ ASSERT(p2m_is_write_locked(p2m));
+
+ /*
+ * Check if the level target is valid: we only support
+ * 4K - 2M - 1G mapping.
+ */
+ ASSERT(target <= 2);
+
+ table = p2m_get_root_pointer(p2m, gfn);
+ if ( !table )
+ return -EINVAL;
+
+ for ( level = P2M_ROOT_LEVEL; level > target; level-- )
+ {
+ /*
+ * Don't try to allocate intermediate page table if the mapping
+ * is about to be removed.
+ */
+ rc = p2m_next_level(p2m, !removing_mapping,
+ level, &table, offsets[level]);
+ if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
+ {
+ rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
+ /*
+ * We are here because p2m_next_level has failed to map
+ * the intermediate page table (e.g the table does not exist
+ * and they p2m tree is read-only).
I thought I commented on this or something similar already: Calling the
p2m tree "read-only" is imo misleading.
I will change then "read-only" to "not allocatable".
It is a valid case
+ * when removing a mapping as it may not exist in the
+ * page table. In this case, just ignore it.
I fear the "it" has no reference; aiui you mean "ignore the lookup failure",
but the comment isn't worded to refer to that by "it".
I will update the comment correspondingly.
+ */
+ rc = removing_mapping ? 0 : rc;
+ goto out;
+ }
+
+ if ( rc != P2M_TABLE_NORMAL )
+ break;
+ }
+
+ entry = table + offsets[level];
+
+ /*
+ * If we are here with level > target, we must be at a leaf node,
+ * and we need to break up the superpage.
+ */
+ if ( level > target )
+ {
+ panic("Shattering isn't implemented\n");
+ }
+
+ /*
+ * We should always be there with the correct level because all the
+ * intermediate tables have been installed if necessary.
+ */
+ ASSERT(level == target);
+
+ orig_pte = *entry;
+
+ if ( removing_mapping )
+ p2m_clean_pte(entry, p2m->clean_dcache);
+ else
+ {
+ pte_t pte = p2m_pte_from_mfn(mfn, t);
+
+ p2m_write_pte(entry, pte, p2m->clean_dcache);
+
+ p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
+ gfn_add(gfn, BIT(page_order, UL) - 1));
+ p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, gfn);
+ }
+
+ p2m->need_flush = true;
+
+ /*
+ * Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH
+ * is not ready for RISC-V support.
+ *
+ * When CONFIG_HAS_PASSTHROUGH=y, iommu_iotlb_flush() should be done
+ * here.
+ */
+#ifdef CONFIG_HAS_PASSTHROUGH
+# error "add code to flush IOMMU TLB"
+#endif
+
+ rc = 0;
+
+ /*
+ * Free the entry only if the original pte was valid and the base
+ * is different (to avoid freeing when permission is changed).
+ *
+ * If previously MFN 0 was mapped and it is going to be removed
+ * and considering that during removing MFN 0 is used then `entry`
+ * and `new_entry` will be the same and p2m_free_subtree() won't be
+ * called. This case is handled explicitly.
+ */
+ if ( pte_is_valid(orig_pte) &&
+ (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
+ (removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
+ p2m_free_subtree(p2m, orig_pte, level);
I continue to fail to understand why the MFN would matter here.
My understanding is that if, for the same GFN, the MFN changes from MFN_1 to
MFN_2, then we need to update any references on the page referenced by
orig_pte to ensure the proper reference counter is maintained for the page
pointed to by MFN_1.
Isn't the
need to free strictly tied to a VALID -> NOT VALID transition? A permission
change simply retains the VALID state of an entry.
It covers a case when removing happens and probably in this case we don't need
to check specifically for mfn(0) case "mfn_eq(pte_get_mfn(*entry), _mfn(0))",
but it would be enough to check that pte_is_valid(entry) instead:
...
(removing_mapping && !pte_is_valid(entry)))) )
Or only check removing_mapping variable as `entry` would be invalided by the
code above anyway. So we will get:
+ if ( pte_is_valid(orig_pte) &&
+ (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) || removing_mapping) )
+ p2m_free_subtree(p2m, orig_pte, level);
Does it make sense now?
~ Oleksii
|