[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 12/20] xen/riscv: implement p2m_set_range()
On 15.08.2025 11:52, Oleksii Kurochko wrote: > On 8/5/25 6:04 PM, Jan Beulich wrote: >> On 31.07.2025 17:58, Oleksii Kurochko wrote: >>> This patch introduces p2m_set_range() and its core helper p2m_set_entry() >>> for >> Nit: This patch doesn't introduce p2m_set_range(); it merely fleshes it out. >> >>> --- a/xen/arch/riscv/include/asm/p2m.h >>> +++ b/xen/arch/riscv/include/asm/p2m.h >>> @@ -7,11 +7,13 @@ >>> #include <xen/rwlock.h> >>> #include <xen/types.h> >>> >>> +#include <asm/page.h> >>> #include <asm/page-bits.h> >>> >>> extern unsigned int p2m_root_order; >>> #define P2M_ROOT_ORDER p2m_root_order >>> #define P2M_ROOT_PAGES BIT(P2M_ROOT_ORDER, U) >>> +#define P2M_ROOT_LEVEL HYP_PT_ROOT_LEVEL >> I think I commented on this before, and I would have hoped for at least a >> remark >> in the description to appear (perhaps even a comment here): It's okay(ish) >> to tie >> these together for now, but in the longer run I don't expect this is going >> to be >> wanted. If e.g. we ran Xen in Sv57 mode, there would be no reason at all to >> force >> all P2Ms to use 5 levels of page tables. > > Do you mean that for G-stage it could be chosen any SvXX mode to limit an > amount > of page tables necessary for G-stage? If yes, then, at least, I agree that a > comment should be added or, probably, "#warning optimize an amount of p2m > root level > for MMU mode > Sv48" (or maybe >=). Yes. > Or do you mean if we set hgatp.mode=Sv57 then it is possible to limit an > amount of > page table's levels to use? In this case I think hardware still will expect > to see > 5 levels of page tables. No. >>> +static inline void p2m_write_pte(pte_t *p, pte_t pte, bool clean_pte) >>> +{ >>> + write_pte(p, pte); >>> + if ( clean_pte ) >>> + clean_dcache_va_range(p, sizeof(*p)); >> Not necessarily for right away, but if multiple adjacent PTEs are >> written without releasing the lock, this then redundant cache flushing >> can be a performance issue. > > Can't it be resolved on a caller side? Something like: > p2m_write_pte(p1, pte1, false); > p2m_write_pte(p2, pte2, false); > p2m_write_pte(p3, pte3, false); > p2m_write_pte(p4, pte4, true); > where p1-p4 are adjacent. No. You wouldn't know whether the last write flushes what the earlier three have written. There may be a cacheline boundary in between. Plus I didn't really think of back-to-back writes, but e.g. a loop doing many of them, where a single wider flush may then be more efficient. >>> +#define P2M_TABLE_MAP_NONE 0 >>> +#define P2M_TABLE_MAP_NOMEM 1 >>> +#define P2M_TABLE_SUPER_PAGE 2 >>> +#define P2M_TABLE_NORMAL 3 >>> + >>> +/* >>> + * Take the currently mapped table, find the corresponding the entry >>> + * corresponding to the GFN, and map the next table, if available. >> Nit: Double "corresponding". >> >>> + * The previous table will be unmapped if the next level was mapped >>> + * (e.g P2M_TABLE_NORMAL returned). >>> + * >>> + * `alloc_tbl` parameter indicates whether intermediate tables should >>> + * be allocated when not present. >>> + * >>> + * Return values: >>> + * P2M_TABLE_MAP_NONE: a table allocation isn't permitted. >>> + * P2M_TABLE_MAP_NOMEM: allocating a new page failed. >>> + * P2M_TABLE_SUPER_PAGE: next level or leaf mapped normally. >>> + * P2M_TABLE_NORMAL: The next entry points to a superpage. >>> + */ >>> +static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl, >>> + unsigned int level, pte_t **table, >>> + unsigned int offset) >>> +{ >>> + panic("%s: hasn't been implemented yet\n", __func__); >>> + >>> + return P2M_TABLE_MAP_NONE; >>> +} >>> + >>> +/* Free pte sub-tree behind an entry */ >>> +static void p2m_free_subtree(struct p2m_domain *p2m, >>> + pte_t entry, unsigned int level) >>> +{ >>> + panic("%s: hasn't been implemented yet\n", __func__); >>> +} >>> + >>> +/* >>> + * Insert an entry in the p2m. This should be called with a mapping >>> + * equal to a page/superpage. >>> + */ >>> +static int p2m_set_entry(struct p2m_domain *p2m, >>> + gfn_t gfn, >>> + unsigned long page_order, >>> + mfn_t mfn, >>> + p2m_type_t t) >>> +{ >>> + unsigned int level; >>> + unsigned int target = page_order / PAGETABLE_ORDER; >>> + pte_t *entry, *table, orig_pte; >>> + int rc; >>> + /* A mapping is removed if the MFN is invalid. */ >>> + bool removing_mapping = mfn_eq(mfn, INVALID_MFN); >> Comment and code don't fit together. Many MFNs are invalid (any for which >> mfn_valid() returns false), yet you only check for INVALID_MFN here. > > Probably, it makes sense to add an|ASSERT()| here for the case when > |mfn_valid(mfn)| is false, but the MFN is not explicitly equal > to|INVALID_MFN|. > This would indicate that someone attempted to perform a mapping with an > incorrect MFN, which, IMO, is entirely wrong. No, and we've been there before. MMIO can live anywhere, and mappings for such still will need to be permitted. It is correct to check only for INVALID_MFN here imo; it's just the comment which also needs to reflect that. >>> + /* >>> + * If we are here with level > target, we must be at a leaf node, >>> + * and we need to break up the superpage. >>> + */ >>> + if ( level > target ) >>> + { >>> + panic("Shattering isn't implemented\n"); >>> + } >>> + >>> + /* >>> + * We should always be there with the correct level because all the >>> + * intermediate tables have been installed if necessary. >>> + */ >>> + ASSERT(level == target); >>> + >>> + orig_pte = *entry; >>> + >>> + if ( removing_mapping ) >>> + p2m_clean_pte(entry, p2m->clean_pte); >>> + else >>> + { >>> + pte_t pte = p2m_pte_from_mfn(mfn, t); >>> + >>> + p2m_write_pte(entry, pte, p2m->clean_pte); >>> + >>> + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn, >>> + gfn_add(gfn, BIT(page_order, UL) - >>> 1)); >>> + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, gfn); >>> + } >>> + >>> + p2m->need_flush = true; >>> + >>> + /* >>> + * Currently, the infrastructure required to enable >>> CONFIG_HAS_PASSTHROUGH >>> + * is not ready for RISC-V support. >>> + * >>> + * When CONFIG_HAS_PASSTHROUGH=y, iommu_iotlb_flush() should be done >>> + * here. >>> + */ >>> +#ifdef CONFIG_HAS_PASSTHROUGH >>> +# error "add code to flush IOMMU TLB" >>> +#endif >>> + >>> + rc = 0; >>> + >>> + /* >>> + * Free the entry only if the original pte was valid and the base >>> + * is different (to avoid freeing when permission is changed). >>> + */ >>> + if ( pte_is_valid(orig_pte) && >>> + !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ) >> I'm puzzled by this 2nd check: A permission change would - I expect - only >> occur to a leaf entry. If the new entry is a super-page one and the old >> wasn't, don't you still need to free the sub-tree, no matter whether the >> MFNs are the same? > > I expect the MFNs to differ in this scenario, so the old sub-tree will be > freed. You expecting something isn't a good criteria. If it's possible, even if unexpected (by you), it needs dealing with correctly. > Based on your example (new entry is super-page and old entry isn't): > For old mapping (lets say, 4 KiB leaf) p2m_set_entry() walks all levels down > to L0, so we will have the following MMU page table walks: > L2 PTE -> L1 PTE (MFN of L0 page table) -> L0 PTE -> RAM > > When new mapping (lets say, 2 MiB superpage) will be requested, > p2m_set_entry() > will stop at L1 (the superpage level): > L2 PTE -> L1 PTE (at this moment, L1 PTE points to L0 page table, which > points to RAM) > Then the old L1 PTE will be saved in 'orig_pte', then writes 'entry' with > the RAM MFN for the 2 MiB mapping. The walk becomes: > L2 PTE -> L1 PTE -> RAM > > Therefore, 'entry' now holds an MFN pointing to RAM (superpage leaf). > 'orig_pte' > still holds an MFN pointing to the L0 table (the old sub-tree). Since these > MFNs > differ, the code calls p2m_free_subtree(p2m, orig_pte, …) and frees the old L0 > sub-tree. A particular example doesn't help. All possible cases need handling correctly. >> Plus consider the special case of MFN 0: If you clear >> an entry using MFN 0, you will find old and new PTEs' both having the same >> MFN. > > Isn't this happen only when a mapping removal is explicitly requested? > In the case of a mapping removal it seems to me it is enough just to > clear PTE with all zeroes. Correct. Which means original MFN (PPN) and new MFN (PPN) would match. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |