[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 22 Jul 2025 18:02:52 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 22 Jul 2025 16:03:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.07.2025 16:57, Oleksii Kurochko wrote:
> 
> On 7/21/25 3:34 PM, Jan Beulich wrote:
>> On 17.07.2025 18:37, Oleksii Kurochko wrote:
>>> On 7/2/25 11:25 AM, Jan Beulich wrote:
>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>> Add support for down large memory mappings ("superpages") in the RISC-V
>>>>> p2m mapping so that smaller, more precise mappings ("finer-grained 
>>>>> entries")
>>>>> can be inserted into lower levels of the page table hierarchy.
>>>>>
>>>>> To implement that the following is done:
>>>>> - Introduce p2m_split_superpage(): Recursively shatters a superpage into
>>>>>     smaller page table entries down to the target level, preserving 
>>>>> original
>>>>>     permissions and attributes.
>>>>> - __p2m_set_entry() updated to invoke superpage splitting when inserting
>>>>>     entries at lower levels within a superpage-mapped region.
>>>>>
>>>>> This implementation is based on the ARM code, with modifications to the 
>>>>> part
>>>>> that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does
>>>>> not require BBM, so there is no need to invalidate the PTE and flush the
>>>>> TLB before updating it with the newly created, split page table.
>>>> But some flushing is going to be necessary. As long as you only ever do
>>>> global flushes, the one after the individual PTE modification (within the
>>>> split table) will do (if BBM isn't required, see below), but once you move
>>>> to more fine-grained flushing, that's not going to be enough anymore. Not
>>>> sure it's a good idea to leave such a pitfall.
>>> I think that I don't fully understand what is an issue.
>> Whether a flush is necessary after solely breaking up a superpage is arch-
>> defined. I don't know how much restrictions the spec on possible hardware
>> behavior for RISC-V. However, the eventual change of (at least) one entry
>> of fulfill the original request will surely require a flush. What I was
>> trying to say is that this required flush would better not also cover for
>> the flushes that may or may not be required by the spec. IOW if the spec
>> leaves any room for flushes to possibly be needed, those flushes would
>> better be explicit.
> 
> I think that I still don't understand why what I wrote above will work as long
> as global flushes is working and will stop to work when more fine-grained 
> flushing
> is used.
> 
> Inside p2m_split_superpage() we don't need any kind of TLB flush operation as
> it is allocation a new page for a table and works with it, so no one could use
> this page at the moment and cache it in TLB.
> 
> The only question is that if it is needed BBM before staring using splitted 
> entry:
>          ....
>          if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
>          {
>              /* Free the allocated sub-tree */
>              p2m_free_subtree(p2m, split_pte, level);
> 
>              rc = -ENOMEM;
>              goto out;
>          }
> 
> ------> /* Should be BBM used here ? */
>          p2m_write_pte(entry, split_pte, p2m->clean_pte);
> 
> And I can't find anything in the spec what tells me to use BBM here like Arm
> does:

But what you need is a statement in the spec that you can get away without. Imo
at least.

>          /*
>           * Follow the break-before-sequence to update the entry.
>           * For more details see (D4.7.1 in ARM DDI 0487A.j).
>           */
>          p2m_remove_pte(entry, p2m->clean_pte);
>          p2m_force_tlb_flush_sync(p2m);
> 
>          p2m_write_pte(entry, split_pte, p2m->clean_pte);
> 
> I agree that even BBM isn't mandated explicitly sometime it could be useful, 
> but
> here I am not really sure what is the point to do TLB flush before 
> p2m_write_pte()
> as nothing serious will happen if and old mapping will be used for a some time
> as we are keeping the same rights for smaller (splited) mapping.
> The reason why Arm does p2m_remove_pte() & p2m_force_tlb_flush() before 
> updating
> an entry here as it is doesn't guarantee that everything will be okay and they
> explicitly tells:
>   This situation can possibly break coherency and ordering guarantees, 
> leading to
>   inconsistent unwanted behavior in our Processing Element (PE).
> 
> 
>>>> As to (no need for) BBM: I couldn't find anything to that effect in the
>>>> privileged spec. Can you provide some pointer? What I found instead is e.g.
>>>> this sentence: "To ensure that implicit reads observe writes to the same
>>>> memory locations, an SFENCE.VMA instruction must be executed after the
>>>> writes to flush the relevant cached translations." And this: "Accessing the
>>>> same location using different cacheability attributes may cause loss of
>>>> coherence." (This may not only occur when the same physical address is
>>>> mapped twice at different VAs, but also after the shattering of a superpage
>>>> when the new entry differs in cacheability.)
>>> I also couldn't find that RISC-V spec mandates BBM explicitly as Arm does 
>>> it.
>>>
>>> What I meant that on RISC-V can do:
>>> - Write new PTE
>>> - Flush TLB
>>>
>>> While on Arm it is almost always needed to do:
>>> - Write zero to PTE
>>> - Flush TLB
>>> - Write new PTE
>>>
>>> For example, the common CoW code path where you copy from a read only page 
>>> to
>>> a new page, then map that new page as writable just works on RISC-V without
>>> extra considerations and on Arm it requires BBM.
>> CoW is a specific sub-case with increasing privilege.
> 
> Could you please explain why it matters increasing of privilege in terms of 
> TLB
> flushing and PTE clearing before writing a new PTE?

Some architectures automatically re-walk page tables when encountering a
permission violation based on TLB contents. Hence increasing privilege
can be a special case.

>>> It seems to me that BBM is mandated for Arm only because that TLB is shared
>>> among cores, so there is no any guarantee that no prior entry for same VA
>>> remains in TLB. In case of RISC-V's TLB isn't shared and after a flush it is
>>> guaranteed that no prior entry for the same VA remains in the TLB.
>> Not sure that's the sole reason. But again the question is: Is this written
>> down explicitly anywhere? Generally there can be multiple levels of TLBs, and
>> while some of them may be per-core, others may be shared.
> 
> Spec. mentions that:
>    If a hart employs an address-translation cache, that cache must appear to 
> be
>    private to that hart.

Hmm, okay, that indeed pretty much excludes shared TLBs.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.