Xen project Mailing List

Re: [PATCH v2 13/17] xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration

To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Date: Tue, 22 Jul 2025 14:00:24 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 22 Jul 2025 12:00:39 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.07.2025 13:34, Oleksii Kurochko wrote: > > On 7/22/25 12:41 PM, Oleksii Kurochko wrote: >> >> >> On 7/21/25 2:18 PM, Jan Beulich wrote: >>> On 18.07.2025 11:52, Oleksii Kurochko wrote: >>>> On 7/17/25 12:25 PM, Jan Beulich wrote: >>>>> On 17.07.2025 10:56, Oleksii Kurochko wrote: >>>>>> On 7/16/25 6:18 PM, Jan Beulich wrote: >>>>>>> On 16.07.2025 18:07, Oleksii Kurochko wrote: >>>>>>>> On 7/16/25 1:31 PM, Jan Beulich wrote: >>>>>>>>> On 15.07.2025 16:47, Oleksii Kurochko wrote: >>>>>>>>>> On 7/1/25 5:08 PM, Jan Beulich wrote: >>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote: >>>>>>>>>>>> --- a/xen/arch/riscv/p2m.c >>>>>>>>>>>> +++ b/xen/arch/riscv/p2m.c >>>>>>>>>>>> @@ -345,6 +345,26 @@ static pte_t *p2m_get_root_pointer(struct >>>>>>>>>>>> p2m_domain *p2m, gfn_t gfn) >>>>>>>>>>>> return __map_domain_page(p2m->root + root_table_indx); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +static int p2m_type_radix_set(struct p2m_domain *p2m, pte_t pte, >>>>>>>>>>>> p2m_type_t t) >>>>>>>>>>> See comments on the earlier patch regarding naming. >>>>>>>>>>> >>>>>>>>>>>> +{ >>>>>>>>>>>> + int rc; >>>>>>>>>>>> + gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte)); >>>>>>>>>>> How does this work, when you record GFNs only for Xenheap pages? >>>>>>>>>> I think I don't understand what is an issue. Could you please provide >>>>>>>>>> some extra details? >>>>>>>>> Counter question: The mfn_to_gfn() you currently have is only a stub. >>>>>>>>> It only >>>>>>>>> works for 1:1 mapped domains. Can you show me the eventual final >>>>>>>>> implementation >>>>>>>>> of the function, making it possible to use it here? >>>>>>>> At the moment, I planned to support only 1:1 mapped domains, so it is >>>>>>>> final >>>>>>>> implementation. >>>>>>> Isn't that on overly severe limitation? >>>>>> I wouldn't say that it's a severe limitation, as it's just a matter of >>>>>> how >>>>>> |mfn_to_gfn()| is implemented. When non-1:1 mapped domains are supported, >>>>>> |mfn_to_gfn()| can be implemented differently, while the code where it’s >>>>>> called >>>>>> will likely remain unchanged. >>>>>> >>>>>> What I meant in my reply is that, for the current state and current >>>>>> limitations, >>>>>> this is the final implementation of|mfn_to_gfn()|. But that doesn't mean >>>>>> I don't >>>>>> see the value in, or the need for, non-1:1 mapped domains—it's just that >>>>>> this >>>>>> limitation simplifies development at the current stage of the RISC-V >>>>>> port. >>>>> Simplification is fine in some cases, but not supporting the "normal" way >>>>> of >>>>> domain construction looks like a pretty odd restriction. I'm also curious >>>>> how you envision to implement mfn_to_gfn() then, suitable for generic use >>>>> like >>>>> the one here. Imo, current limitation or not, you simply want to avoid >>>>> use of >>>>> that function outside of the special gnttab case. >>>>> >>>>>>>>>>> In this context (not sure if I asked before): With this use of a >>>>>>>>>>> radix tree, >>>>>>>>>>> how do you intend to bound the amount of memory that a domain can >>>>>>>>>>> use, by >>>>>>>>>>> making Xen insert very many entries? >>>>>>>>>> I didn’t think about that. I assumed it would be enough to set the >>>>>>>>>> amount of >>>>>>>>>> memory a guest domain can use by specifying|xen,domain-p2m-mem-mb| >>>>>>>>>> in the DTS, >>>>>>>>>> or using some predefined value if|xen,domain-p2m-mem-mb| isn’t >>>>>>>>>> explicitly set. >>>>>>>>> Which would require these allocations to come from that pool. >>>>>>>> Yes, and it is true only for non-hardware domains with the current >>>>>>>> implementation. >>>>>>> ??? >>>>>> I meant that pool is used now only for non-hardware domains at the >>>>>> moment. >>>>> And how does this matter here? The memory required for the radix tree >>>>> doesn't >>>>> come from that pool anyway. >>>> I thought that is possible to do that somehow, but looking at a code of >>>> radix-tree.c it seems like the only one way to allocate memroy for the >>>> radix >>>> tree isradix_tree_node_alloc() -> xzalloc(struct rcu_node). >>>> >>>> Then it is needed to introduce radix_tree_node_allocate(domain) >>> That would be a possibility, but you may have seen that less than half a >>> year ago we got rid of something along these lines. So it would require >>> some pretty good justification to re-introduce. >>> >>>> or radix tree >>>> can't be used at all for mentioned in the previous replies security >>>> reason, no? >>> (Very) careful use may still be possible. But the downside of using this >>> (potentially long lookup times) would always remain. >> Could you please clarify what do you mean here by "(Very) careful"? >> I thought about an introduction of an amount of possible keys in radix tree >> and if this amount >> is 0 then stop domain. And it is also unclear what should be a value for >> this amount. >> Probably, you have better idea. >> >> But generally your idea below ... >>>>>>>>>> Also, it seems this would just lead to the issue you mentioned >>>>>>>>>> earlier: when >>>>>>>>>> the memory runs out,|domain_crash()| will be called or PTE will be >>>>>>>>>> zapped. >>>>>>>>> Or one domain exhausting memory would cause another domain to fail. A >>>>>>>>> domain >>>>>>>>> impacting just itself may be tolerable. But a domain affecting other >>>>>>>>> domains >>>>>>>>> isn't. >>>>>>>> But it seems like this issue could happen in any implementation. It >>>>>>>> won't happen only >>>>>>>> if we will have only pre-populated pool for any domain type (hardware, >>>>>>>> control, guest >>>>>>>> domain) without ability to extend them or allocate extra pages from >>>>>>>> domheap in runtime. >>>>>>>> Otherwise, if extra pages allocation is allowed then we can't really >>>>>>>> do something >>>>>>>> with this issue. >>>>>>> But that's why I brought this up: You simply have to. Or, as indicated, >>>>>>> the >>>>>>> moment you mark Xen security-supported on RISC-V, there will be an XSA >>>>>>> needed. >>>>>> Why it isn't XSA for other architectures? At least, Arm then should have >>>>>> such >>>>>> XSA. >>>>> Does Arm use a radix tree for storing types? It uses one for mem-access, >>>>> but >>>>> it's not clear to me whether that's actually a supported feature. >>>>> >>>>>> I don't understand why x86 won't have the same issue. Memory is the >>>>>> limited >>>>>> and shared resource, so if one of the domain will use to much memory >>>>>> then it could >>>>>> happen that other domains won't have enough memory for its purpose... >>>>> The question is whether allocations are bounded. With this use of a radix >>>>> tree, >>>>> you give domains a way to have Xen allocate pretty much arbitrary amounts >>>>> of >>>>> memory to populate that tree. That unbounded-ness is the problem, not >>>>> memory >>>>> allocations in general. >>>> Isn't radix tree key bounded to an amount of GFNs given for a domain? We >>>> can't have >>>> more keys then a max GFN number for a domain. So a potential amount of >>>> necessary memory >>>> for radix tree is also bounded to an amount of GFNs. >>> To some degree yes, hence why I said "pretty much arbitrary amounts". >>> But recall that "amount of GFNs" is a fuzzy term; I think you mean to >>> use it to describe the amount of memory pages given to the guest. GFNs >>> can be used for other purposes, though. Guests could e.g. grant >>> themselves access to their own memory, then map those grants at >>> otherwise unused GFNs. >>> >>>> Anyway, IIUC I just can't use radix tree for p2m types at all, right? >>>> If yes, does it make sense to borrow 2 bits from struct >>>> page_info->type_info as now it >>>> is used 9-bits for count of a frame? >>> struct page_info describes MFNs, when you want to describe GFNs. As you >>> mentioned earlier, multiple GFNs can in principle map to the same MFN. >>> You would force them to all have the same properties, which would be in >>> direct conflict with e.g. the grant P2M types. >>> >>> Just to mention one possible alternative to using radix trees: You could >>> maintain a 2nd set of intermediate "page tables", just that leaf entries >>> would hold meta data for the respective GFN. The memory for those "page >>> tables" could come from the normal P2M pool (and allocation would thus >>> only consume domain-specific resources). Of course in any model like >>> this the question of lookup times (as mentioned above) would remain. >> ...looks like an optimal option. >> >> The only thing I worry about is that it will require some code duplication >> (I will think how to re-use the current one code), as for example, when >> setting/getting metadata, TLB flushing isn’t needed at all as we aren't >> working with with real P2M page tables. >> Agree that lookup won't be the best one, but nothing can be done with >> such models. > > Probably, instead of having a second set of intermediate "page tables", > we could just allocate two consecutive pages within the real P2M page > tables for the intermediate page table. The first page would serve as > the actual page table to which the intermediate page table points, > and the second page would store metadata for each entry of the page > table that the intermediate page table references. > > As we are supporting only 1gb, 2mb and 4kb mappings we could do a little > optimization and start allocate these consecutive pages only for PT levels > which corresponds to 1gb, 2mb, 4kb mappings. > > Does it make sense? I was indeed entertaining this idea, but I couldn't conclude for myself if that would indeed be without any rough edges. Hence I didn't want to suggest such. For example, the need to have adjacent pairs of pages could result in a higher rate of allocation failures (while populating or re-sizing the P2M pool). This would be possible to avoid by still using entirely separate pages, and then merely linking them together via some unused struct page_info fields (the "normal" linking fields can't be used, afaict). Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.