[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 21 Jul 2025 15:34:34 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 21 Jul 2025 13:34:58 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.07.2025 18:37, Oleksii Kurochko wrote:
> On 7/2/25 11:25 AM, Jan Beulich wrote:
>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>> Add support for down large memory mappings ("superpages") in the RISC-V
>>> p2m mapping so that smaller, more precise mappings ("finer-grained entries")
>>> can be inserted into lower levels of the page table hierarchy.
>>>
>>> To implement that the following is done:
>>> - Introduce p2m_split_superpage(): Recursively shatters a superpage into
>>>    smaller page table entries down to the target level, preserving original
>>>    permissions and attributes.
>>> - __p2m_set_entry() updated to invoke superpage splitting when inserting
>>>    entries at lower levels within a superpage-mapped region.
>>>
>>> This implementation is based on the ARM code, with modifications to the part
>>> that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does
>>> not require BBM, so there is no need to invalidate the PTE and flush the
>>> TLB before updating it with the newly created, split page table.
>> But some flushing is going to be necessary. As long as you only ever do
>> global flushes, the one after the individual PTE modification (within the
>> split table) will do (if BBM isn't required, see below), but once you move
>> to more fine-grained flushing, that's not going to be enough anymore. Not
>> sure it's a good idea to leave such a pitfall.
> 
> I think that I don't fully understand what is an issue.

Whether a flush is necessary after solely breaking up a superpage is arch-
defined. I don't know how much restrictions the spec on possible hardware
behavior for RISC-V. However, the eventual change of (at least) one entry
of fulfill the original request will surely require a flush. What I was
trying to say is that this required flush would better not also cover for
the flushes that may or may not be required by the spec. IOW if the spec
leaves any room for flushes to possibly be needed, those flushes would
better be explicit.

>> As to (no need for) BBM: I couldn't find anything to that effect in the
>> privileged spec. Can you provide some pointer? What I found instead is e.g.
>> this sentence: "To ensure that implicit reads observe writes to the same
>> memory locations, an SFENCE.VMA instruction must be executed after the
>> writes to flush the relevant cached translations." And this: "Accessing the
>> same location using different cacheability attributes may cause loss of
>> coherence." (This may not only occur when the same physical address is
>> mapped twice at different VAs, but also after the shattering of a superpage
>> when the new entry differs in cacheability.)
> 
> I also couldn't find that RISC-V spec mandates BBM explicitly as Arm does it.
> 
> What I meant that on RISC-V can do:
> - Write new PTE
> - Flush TLB
> 
> While on Arm it is almost always needed to do:
> - Write zero to PTE
> - Flush TLB
> - Write new PTE
> 
> For example, the common CoW code path where you copy from a read only page to
> a new page, then map that new page as writable just works on RISC-V without
> extra considerations and on Arm it requires BBM.

CoW is a specific sub-case with increasing privilege.

> It seems to me that BBM is mandated for Arm only because that TLB is shared
> among cores, so there is no any guarantee that no prior entry for same VA
> remains in TLB. In case of RISC-V's TLB isn't shared and after a flush it is
> guaranteed that no prior entry for the same VA remains in the TLB.

Not sure that's the sole reason. But again the question is: Is this written
down explicitly anywhere? Generally there can be multiple levels of TLBs, and
while some of them may be per-core, others may be shared.

>>> +    /*
>>> +     * Even if we failed, we should install the newly allocated PTE
>>> +     * entry. The caller will be in charge to free the sub-tree.
>>> +     */
>>> +    p2m_write_pte(entry, page_to_p2m_table(p2m, page), p2m->clean_pte);
>> Why would it be wrong to free the page right here, vacating the entry at
>> the same time (or leaving just that to the caller)? (IOW - if this is an
>> implementation decision of yours, I think the word "should" would want
>> dropping.) After all, the caller invoking p2m_free_entry() on the thus
>> split PTE is less efficient (needs to iterate over all entries) than on
>> the original one (where it's just a single superpage).
> 
> I think that I didn't get your idea.

Well, first and foremost this was a question to you, as it's not clear to
me whether "should" is the correct word to use here. It would be
appropriate if the spec mandated this behavior. It would seem less
appropriate if this arrangement was an implementation choice of yours.
And it looks to me as if the latter was the case here.

>>> @@ -806,7 +877,36 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
>>>        */
>>>       if ( level > target )
>> This condition is likely too strong, unless you actually mean to also
>> split a superpage if it really wouldn't need splitting (new entry written
>> still fitting with the superpage mapping, i.e. suitable MFN and same
>> attributes).
> 
> I am not really sure that I fully understand.
> My understanding is if level != target then the splitting is needed, I am
> not really get the part "split a superpage if it really wouldn't need 
> splitting".

Hmm, maybe I was wrong here. The caller determines at what level the
actual change needs to occur? In which case what you have may be right.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.