[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings
On 22.07.2025 16:57, Oleksii Kurochko wrote: > > On 7/21/25 3:34 PM, Jan Beulich wrote: >> On 17.07.2025 18:37, Oleksii Kurochko wrote: >>> On 7/2/25 11:25 AM, Jan Beulich wrote: >>>> On 10.06.2025 15:05, Oleksii Kurochko wrote: >>>>> Add support for down large memory mappings ("superpages") in the RISC-V >>>>> p2m mapping so that smaller, more precise mappings ("finer-grained >>>>> entries") >>>>> can be inserted into lower levels of the page table hierarchy. >>>>> >>>>> To implement that the following is done: >>>>> - Introduce p2m_split_superpage(): Recursively shatters a superpage into >>>>> smaller page table entries down to the target level, preserving >>>>> original >>>>> permissions and attributes. >>>>> - __p2m_set_entry() updated to invoke superpage splitting when inserting >>>>> entries at lower levels within a superpage-mapped region. >>>>> >>>>> This implementation is based on the ARM code, with modifications to the >>>>> part >>>>> that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does >>>>> not require BBM, so there is no need to invalidate the PTE and flush the >>>>> TLB before updating it with the newly created, split page table. >>>> But some flushing is going to be necessary. As long as you only ever do >>>> global flushes, the one after the individual PTE modification (within the >>>> split table) will do (if BBM isn't required, see below), but once you move >>>> to more fine-grained flushing, that's not going to be enough anymore. Not >>>> sure it's a good idea to leave such a pitfall. >>> I think that I don't fully understand what is an issue. >> Whether a flush is necessary after solely breaking up a superpage is arch- >> defined. I don't know how much restrictions the spec on possible hardware >> behavior for RISC-V. However, the eventual change of (at least) one entry >> of fulfill the original request will surely require a flush. What I was >> trying to say is that this required flush would better not also cover for >> the flushes that may or may not be required by the spec. IOW if the spec >> leaves any room for flushes to possibly be needed, those flushes would >> better be explicit. > > I think that I still don't understand why what I wrote above will work as long > as global flushes is working and will stop to work when more fine-grained > flushing > is used. > > Inside p2m_split_superpage() we don't need any kind of TLB flush operation as > it is allocation a new page for a table and works with it, so no one could use > this page at the moment and cache it in TLB. > > The only question is that if it is needed BBM before staring using splitted > entry: > .... > if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) > { > /* Free the allocated sub-tree */ > p2m_free_subtree(p2m, split_pte, level); > > rc = -ENOMEM; > goto out; > } > > ------> /* Should be BBM used here ? */ > p2m_write_pte(entry, split_pte, p2m->clean_pte); > > And I can't find anything in the spec what tells me to use BBM here like Arm > does: But what you need is a statement in the spec that you can get away without. Imo at least. > /* > * Follow the break-before-sequence to update the entry. > * For more details see (D4.7.1 in ARM DDI 0487A.j). > */ > p2m_remove_pte(entry, p2m->clean_pte); > p2m_force_tlb_flush_sync(p2m); > > p2m_write_pte(entry, split_pte, p2m->clean_pte); > > I agree that even BBM isn't mandated explicitly sometime it could be useful, > but > here I am not really sure what is the point to do TLB flush before > p2m_write_pte() > as nothing serious will happen if and old mapping will be used for a some time > as we are keeping the same rights for smaller (splited) mapping. > The reason why Arm does p2m_remove_pte() & p2m_force_tlb_flush() before > updating > an entry here as it is doesn't guarantee that everything will be okay and they > explicitly tells: > This situation can possibly break coherency and ordering guarantees, > leading to > inconsistent unwanted behavior in our Processing Element (PE). > > >>>> As to (no need for) BBM: I couldn't find anything to that effect in the >>>> privileged spec. Can you provide some pointer? What I found instead is e.g. >>>> this sentence: "To ensure that implicit reads observe writes to the same >>>> memory locations, an SFENCE.VMA instruction must be executed after the >>>> writes to flush the relevant cached translations." And this: "Accessing the >>>> same location using different cacheability attributes may cause loss of >>>> coherence." (This may not only occur when the same physical address is >>>> mapped twice at different VAs, but also after the shattering of a superpage >>>> when the new entry differs in cacheability.) >>> I also couldn't find that RISC-V spec mandates BBM explicitly as Arm does >>> it. >>> >>> What I meant that on RISC-V can do: >>> - Write new PTE >>> - Flush TLB >>> >>> While on Arm it is almost always needed to do: >>> - Write zero to PTE >>> - Flush TLB >>> - Write new PTE >>> >>> For example, the common CoW code path where you copy from a read only page >>> to >>> a new page, then map that new page as writable just works on RISC-V without >>> extra considerations and on Arm it requires BBM. >> CoW is a specific sub-case with increasing privilege. > > Could you please explain why it matters increasing of privilege in terms of > TLB > flushing and PTE clearing before writing a new PTE? Some architectures automatically re-walk page tables when encountering a permission violation based on TLB contents. Hence increasing privilege can be a special case. >>> It seems to me that BBM is mandated for Arm only because that TLB is shared >>> among cores, so there is no any guarantee that no prior entry for same VA >>> remains in TLB. In case of RISC-V's TLB isn't shared and after a flush it is >>> guaranteed that no prior entry for the same VA remains in the TLB. >> Not sure that's the sole reason. But again the question is: Is this written >> down explicitly anywhere? Generally there can be multiple levels of TLBs, and >> while some of them may be per-core, others may be shared. > > Spec. mentions that: > If a hart employs an address-translation cache, that cache must appear to > be > private to that hart. Hmm, okay, that indeed pretty much excludes shared TLBs. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |