Xen project Mailing List

Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings

To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Date: Tue, 22 Jul 2025 18:02:52 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 22 Jul 2025 16:03:15 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.07.2025 16:57, Oleksii Kurochko wrote: > > On 7/21/25 3:34 PM, Jan Beulich wrote: >> On 17.07.2025 18:37, Oleksii Kurochko wrote: >>> On 7/2/25 11:25 AM, Jan Beulich wrote: >>>> On 10.06.2025 15:05, Oleksii Kurochko wrote: >>>>> Add support for down large memory mappings ("superpages") in the RISC-V >>>>> p2m mapping so that smaller, more precise mappings ("finer-grained >>>>> entries") >>>>> can be inserted into lower levels of the page table hierarchy. >>>>> >>>>> To implement that the following is done: >>>>> - Introduce p2m_split_superpage(): Recursively shatters a superpage into >>>>> smaller page table entries down to the target level, preserving >>>>> original >>>>> permissions and attributes. >>>>> - __p2m_set_entry() updated to invoke superpage splitting when inserting >>>>> entries at lower levels within a superpage-mapped region. >>>>> >>>>> This implementation is based on the ARM code, with modifications to the >>>>> part >>>>> that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does >>>>> not require BBM, so there is no need to invalidate the PTE and flush the >>>>> TLB before updating it with the newly created, split page table. >>>> But some flushing is going to be necessary. As long as you only ever do >>>> global flushes, the one after the individual PTE modification (within the >>>> split table) will do (if BBM isn't required, see below), but once you move >>>> to more fine-grained flushing, that's not going to be enough anymore. Not >>>> sure it's a good idea to leave such a pitfall. >>> I think that I don't fully understand what is an issue. >> Whether a flush is necessary after solely breaking up a superpage is arch- >> defined. I don't know how much restrictions the spec on possible hardware >> behavior for RISC-V. However, the eventual change of (at least) one entry >> of fulfill the original request will surely require a flush. What I was >> trying to say is that this required flush would better not also cover for >> the flushes that may or may not be required by the spec. IOW if the spec >> leaves any room for flushes to possibly be needed, those flushes would >> better be explicit. > > I think that I still don't understand why what I wrote above will work as long > as global flushes is working and will stop to work when more fine-grained > flushing > is used. > > Inside p2m_split_superpage() we don't need any kind of TLB flush operation as > it is allocation a new page for a table and works with it, so no one could use > this page at the moment and cache it in TLB. > > The only question is that if it is needed BBM before staring using splitted > entry: > .... > if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) > { > /* Free the allocated sub-tree */ > p2m_free_subtree(p2m, split_pte, level); > > rc = -ENOMEM; > goto out; > } > > ------> /* Should be BBM used here ? */ > p2m_write_pte(entry, split_pte, p2m->clean_pte); > > And I can't find anything in the spec what tells me to use BBM here like Arm > does: But what you need is a statement in the spec that you can get away without. Imo at least. > /* > * Follow the break-before-sequence to update the entry. > * For more details see (D4.7.1 in ARM DDI 0487A.j). > */ > p2m_remove_pte(entry, p2m->clean_pte); > p2m_force_tlb_flush_sync(p2m); > > p2m_write_pte(entry, split_pte, p2m->clean_pte); > > I agree that even BBM isn't mandated explicitly sometime it could be useful, > but > here I am not really sure what is the point to do TLB flush before > p2m_write_pte() > as nothing serious will happen if and old mapping will be used for a some time > as we are keeping the same rights for smaller (splited) mapping. > The reason why Arm does p2m_remove_pte() & p2m_force_tlb_flush() before > updating > an entry here as it is doesn't guarantee that everything will be okay and they > explicitly tells: > This situation can possibly break coherency and ordering guarantees, > leading to > inconsistent unwanted behavior in our Processing Element (PE). > > >>>> As to (no need for) BBM: I couldn't find anything to that effect in the >>>> privileged spec. Can you provide some pointer? What I found instead is e.g. >>>> this sentence: "To ensure that implicit reads observe writes to the same >>>> memory locations, an SFENCE.VMA instruction must be executed after the >>>> writes to flush the relevant cached translations." And this: "Accessing the >>>> same location using different cacheability attributes may cause loss of >>>> coherence." (This may not only occur when the same physical address is >>>> mapped twice at different VAs, but also after the shattering of a superpage >>>> when the new entry differs in cacheability.) >>> I also couldn't find that RISC-V spec mandates BBM explicitly as Arm does >>> it. >>> >>> What I meant that on RISC-V can do: >>> - Write new PTE >>> - Flush TLB >>> >>> While on Arm it is almost always needed to do: >>> - Write zero to PTE >>> - Flush TLB >>> - Write new PTE >>> >>> For example, the common CoW code path where you copy from a read only page >>> to >>> a new page, then map that new page as writable just works on RISC-V without >>> extra considerations and on Arm it requires BBM. >> CoW is a specific sub-case with increasing privilege. > > Could you please explain why it matters increasing of privilege in terms of > TLB > flushing and PTE clearing before writing a new PTE? Some architectures automatically re-walk page tables when encountering a permission violation based on TLB contents. Hence increasing privilege can be a special case. >>> It seems to me that BBM is mandated for Arm only because that TLB is shared >>> among cores, so there is no any guarantee that no prior entry for same VA >>> remains in TLB. In case of RISC-V's TLB isn't shared and after a flush it is >>> guaranteed that no prior entry for the same VA remains in the TLB. >> Not sure that's the sole reason. But again the question is: Is this written >> down explicitly anywhere? Generally there can be multiple levels of TLBs, and >> while some of them may be per-core, others may be shared. > > Spec. mentions that: > If a hart employs an address-translation cache, that cache must appear to > be > private to that hart. Hmm, okay, that indeed pretty much excludes shared TLBs. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.