[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/17] xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 22 Jul 2025 14:00:24 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 22 Jul 2025 12:00:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.07.2025 13:34, Oleksii Kurochko wrote:
> 
> On 7/22/25 12:41 PM, Oleksii Kurochko wrote:
>>
>>
>> On 7/21/25 2:18 PM, Jan Beulich wrote:
>>> On 18.07.2025 11:52, Oleksii Kurochko wrote:
>>>> On 7/17/25 12:25 PM, Jan Beulich wrote:
>>>>> On 17.07.2025 10:56, Oleksii Kurochko wrote:
>>>>>> On 7/16/25 6:18 PM, Jan Beulich wrote:
>>>>>>> On 16.07.2025 18:07, Oleksii Kurochko wrote:
>>>>>>>> On 7/16/25 1:31 PM, Jan Beulich wrote:
>>>>>>>>> On 15.07.2025 16:47, Oleksii Kurochko wrote:
>>>>>>>>>> On 7/1/25 5:08 PM, Jan Beulich wrote:
>>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>>>>>>> --- a/xen/arch/riscv/p2m.c
>>>>>>>>>>>> +++ b/xen/arch/riscv/p2m.c
>>>>>>>>>>>> @@ -345,6 +345,26 @@ static pte_t *p2m_get_root_pointer(struct 
>>>>>>>>>>>> p2m_domain *p2m, gfn_t gfn)
>>>>>>>>>>>>           return __map_domain_page(p2m->root + root_table_indx);
>>>>>>>>>>>>       }
>>>>>>>>>>>>       
>>>>>>>>>>>> +static int p2m_type_radix_set(struct p2m_domain *p2m, pte_t pte, 
>>>>>>>>>>>> p2m_type_t t)
>>>>>>>>>>> See comments on the earlier patch regarding naming.
>>>>>>>>>>>
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    int rc;
>>>>>>>>>>>> +    gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte));
>>>>>>>>>>> How does this work, when you record GFNs only for Xenheap pages?
>>>>>>>>>> I think I don't understand what is an issue. Could you please provide
>>>>>>>>>> some extra details?
>>>>>>>>> Counter question: The mfn_to_gfn() you currently have is only a stub. 
>>>>>>>>> It only
>>>>>>>>> works for 1:1 mapped domains. Can you show me the eventual final 
>>>>>>>>> implementation
>>>>>>>>> of the function, making it possible to use it here?
>>>>>>>> At the moment, I planned to support only 1:1 mapped domains, so it is 
>>>>>>>> final
>>>>>>>> implementation.
>>>>>>> Isn't that on overly severe limitation?
>>>>>> I wouldn't say that it's a severe limitation, as it's just a matter of 
>>>>>> how
>>>>>> |mfn_to_gfn()| is implemented. When non-1:1 mapped domains are supported,
>>>>>> |mfn_to_gfn()| can be implemented differently, while the code where it’s 
>>>>>> called
>>>>>> will likely remain unchanged.
>>>>>>
>>>>>> What I meant in my reply is that, for the current state and current 
>>>>>> limitations,
>>>>>> this is the final implementation of|mfn_to_gfn()|. But that doesn't mean 
>>>>>> I don't
>>>>>> see the value in, or the need for, non-1:1 mapped domains—it's just that 
>>>>>> this
>>>>>> limitation simplifies development at the current stage of the RISC-V 
>>>>>> port.
>>>>> Simplification is fine in some cases, but not supporting the "normal" way 
>>>>> of
>>>>> domain construction looks like a pretty odd restriction. I'm also curious
>>>>> how you envision to implement mfn_to_gfn() then, suitable for generic use 
>>>>> like
>>>>> the one here. Imo, current limitation or not, you simply want to avoid 
>>>>> use of
>>>>> that function outside of the special gnttab case.
>>>>>
>>>>>>>>>>> In this context (not sure if I asked before): With this use of a 
>>>>>>>>>>> radix tree,
>>>>>>>>>>> how do you intend to bound the amount of memory that a domain can 
>>>>>>>>>>> use, by
>>>>>>>>>>> making Xen insert very many entries?
>>>>>>>>>> I didn’t think about that. I assumed it would be enough to set the 
>>>>>>>>>> amount of
>>>>>>>>>> memory a guest domain can use by specifying|xen,domain-p2m-mem-mb| 
>>>>>>>>>> in the DTS,
>>>>>>>>>> or using some predefined value if|xen,domain-p2m-mem-mb| isn’t 
>>>>>>>>>> explicitly set.
>>>>>>>>> Which would require these allocations to come from that pool.
>>>>>>>> Yes, and it is true only for non-hardware domains with the current 
>>>>>>>> implementation.
>>>>>>> ???
>>>>>> I meant that pool is used now only for non-hardware domains at the 
>>>>>> moment.
>>>>> And how does this matter here? The memory required for the radix tree 
>>>>> doesn't
>>>>> come from that pool anyway.
>>>> I thought that is possible to do that somehow, but looking at a code of
>>>> radix-tree.c it seems like the only one way to allocate memroy for the 
>>>> radix
>>>> tree isradix_tree_node_alloc() -> xzalloc(struct rcu_node).
>>>>
>>>> Then it is needed to introduce radix_tree_node_allocate(domain)
>>> That would be a possibility, but you may have seen that less than half a
>>> year ago we got rid of something along these lines. So it would require
>>> some pretty good justification to re-introduce.
>>>
>>>> or radix tree
>>>> can't be used at all for mentioned in the previous replies security 
>>>> reason, no?
>>> (Very) careful use may still be possible. But the downside of using this
>>> (potentially long lookup times) would always remain.
>> Could you please clarify what do you mean here by "(Very) careful"?
>> I thought about an introduction of an amount of possible keys in radix tree 
>> and if this amount
>> is 0 then stop domain. And it is also unclear what should be a value for 
>> this amount.
>> Probably, you have better idea.
>>
>> But generally your idea below ...
>>>>>>>>>> Also, it seems this would just lead to the issue you mentioned 
>>>>>>>>>> earlier: when
>>>>>>>>>> the memory runs out,|domain_crash()| will be called or PTE will be 
>>>>>>>>>> zapped.
>>>>>>>>> Or one domain exhausting memory would cause another domain to fail. A 
>>>>>>>>> domain
>>>>>>>>> impacting just itself may be tolerable. But a domain affecting other 
>>>>>>>>> domains
>>>>>>>>> isn't.
>>>>>>>> But it seems like this issue could happen in any implementation. It 
>>>>>>>> won't happen only
>>>>>>>> if we will have only pre-populated pool for any domain type (hardware, 
>>>>>>>> control, guest
>>>>>>>> domain) without ability to extend them or allocate extra pages from 
>>>>>>>> domheap in runtime.
>>>>>>>> Otherwise, if extra pages allocation is allowed then we can't really 
>>>>>>>> do something
>>>>>>>> with this issue.
>>>>>>> But that's why I brought this up: You simply have to. Or, as indicated, 
>>>>>>> the
>>>>>>> moment you mark Xen security-supported on RISC-V, there will be an XSA 
>>>>>>> needed.
>>>>>> Why it isn't XSA for other architectures? At least, Arm then should have 
>>>>>> such
>>>>>> XSA.
>>>>> Does Arm use a radix tree for storing types? It uses one for mem-access, 
>>>>> but
>>>>> it's not clear to me whether that's actually a supported feature.
>>>>>
>>>>>> I don't understand why x86 won't have the same issue. Memory is the 
>>>>>> limited
>>>>>> and shared resource, so if one of the domain will use to much memory 
>>>>>> then it could
>>>>>> happen that other domains won't have enough memory for its purpose...
>>>>> The question is whether allocations are bounded. With this use of a radix 
>>>>> tree,
>>>>> you give domains a way to have Xen allocate pretty much arbitrary amounts 
>>>>> of
>>>>> memory to populate that tree. That unbounded-ness is the problem, not 
>>>>> memory
>>>>> allocations in general.
>>>> Isn't radix tree key bounded to an amount of GFNs given for a domain? We 
>>>> can't have
>>>> more keys then a max GFN number for a domain. So a potential amount of 
>>>> necessary memory
>>>> for radix tree is also bounded to an amount of GFNs.
>>> To some degree yes, hence why I said "pretty much arbitrary amounts".
>>> But recall that "amount of GFNs" is a fuzzy term; I think you mean to
>>> use it to describe the amount of memory pages given to the guest. GFNs
>>> can be used for other purposes, though. Guests could e.g. grant
>>> themselves access to their own memory, then map those grants at
>>> otherwise unused GFNs.
>>>
>>>> Anyway, IIUC I just can't use radix tree for p2m types at all, right?
>>>> If yes, does it make sense to borrow 2 bits from struct 
>>>> page_info->type_info as now it
>>>> is used 9-bits for count of a frame?
>>> struct page_info describes MFNs, when you want to describe GFNs. As you
>>> mentioned earlier, multiple GFNs can in principle map to the same MFN.
>>> You would force them to all have the same properties, which would be in
>>> direct conflict with e.g. the grant P2M types.
>>>
>>> Just to mention one possible alternative to using radix trees: You could
>>> maintain a 2nd set of intermediate "page tables", just that leaf entries
>>> would hold meta data for the respective GFN. The memory for those "page
>>> tables" could come from the normal P2M pool (and allocation would thus
>>> only consume domain-specific resources). Of course in any model like
>>> this the question of lookup times (as mentioned above) would remain.
>> ...looks like an optimal option.
>>
>> The only thing I worry about is that it will require some code duplication
>> (I will think how to re-use the current one code), as for example, when
>> setting/getting metadata, TLB flushing isn’t needed at all as we aren't
>> working with with real P2M page tables.
>> Agree that lookup won't be the best one, but nothing can be done with
>> such models.
> 
> Probably, instead of having a second set of intermediate "page tables",
> we could just allocate two consecutive pages within the real P2M page
> tables for the intermediate page table. The first page would serve as
> the actual page table to which the intermediate page table points,
> and the second page would store metadata for each entry of the page
> table that the intermediate page table references.
> 
> As we are supporting only 1gb, 2mb and 4kb mappings we could do a little
> optimization and start allocate these consecutive pages only for PT levels
> which corresponds to 1gb, 2mb, 4kb mappings.
> 
> Does it make sense?

I was indeed entertaining this idea, but I couldn't conclude for myself if
that would indeed be without any rough edges. Hence I didn't want to
suggest such. For example, the need to have adjacent pairs of pages could
result in a higher rate of allocation failures (while populating or
re-sizing the P2M pool). This would be possible to avoid by still using
entirely separate pages, and then merely linking them together via some
unused struct page_info fields (the "normal" linking fields can't be used,
afaict).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.