|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/3] xen/riscv: introduce setup_mm()
On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote:
> On 23.10.2024 17:50, Oleksii Kurochko wrote:
> > Introduce the implementation of setup_mm(), which includes:
> > 1. Adding all free regions to the boot allocator, as memory is
> > needed
> > to allocate page tables used for frame table mapping.
> > 2. Calculating RAM size and the RAM end address.
> > 3. Setting up direct map mappings from each RAM bank and initialize
> > directmap_virt_start (also introduce XENHEAP_VIRT_START which is
> > defined as directmap_virt_start) to be properly aligned with RAM
> > start to use more superpages to reduce pressure on the TLB.
> > 4. Setting up frame table mappings from physical address 0 to
> > ram_end
> > to simplify mfn_to_page() and page_to_mfn() conversions.
> > 5. Setting up total_pages and max_page.
> >
> > Update virt_to_maddr() to use introduced XENHEAP_VIRT_START.
> >
> > Implement maddr_to_virt() function to convert a machine address
> > to a virtual address. This function is specifically designed to be
> > used
> > only for the DIRECTMAP region, so a check has been added to ensure
> > that
> > the address does not exceed DIRECTMAP_SIZE.
>
> I'm unconvinced by this. Conceivably the function could be used on
> "imaginary" addresses, just to calculate abstract positions or e.g.
> deltas. At the same time I'm also not going to insist on the removal
> of
> that assertion, so long as it doesn't trigger.
>
> > After the introduction of maddr_to_virt() the following linkage
> > error starts
> > to occur and to avoid it share_xen_page_with_guest() stub is added:
> > riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill':
> > /build/xen/common/tasklet.c:176: undefined reference to
> > `share_xen_page_with_guest'
> > riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
> > `share_xen_page_with_guest'
> > isn't defined riscv64-linux-gnu-ld: final link failed: bad
> > value
> >
> > Despite the linkger fingering tasklet.c, it's trace.o which has the
> > undefined
> > refenrece:
> > $ find . -name \*.o | while read F; do nm $F | grep
> > share_xen_page_with_guest &&
> > echo $F; done
> > U share_xen_page_with_guest
> > ./xen/common/built_in.o
> > U share_xen_page_with_guest
> > ./xen/common/trace.o
> > U share_xen_page_with_guest
> > ./xen/prelink.o
> >
> > Looking at trace.i, there is call of share_xen_page_with_guest()
> > but in case of
> > when maddr_to_virt() is defined as "return NULL" compiler optimizes
> > the part of
> > common/trace.c code where share_xen_page_with_priviliged_guest() is
> > called
> > ( there is no any code in dissambled common/trace.o ) so there is
> > no real call
> > of share_xen_page_with_priviliged_guest().
>
> I don't think it's the "return NULL", but rather BUG_ON()'s (really
> BUG()'s)
> unreachable(). Not the least because the function can't validly
> return NULL,
> and hence callers have no need to check for NULL.
>
> > @@ -25,8 +27,11 @@
> >
> > static inline void *maddr_to_virt(paddr_t ma)
> > {
> > - BUG_ON("unimplemented");
> > - return NULL;
> > + unsigned long va_offset = maddr_to_directmapoff(ma);
> > +
> > + ASSERT(va_offset < DIRECTMAP_SIZE);
> > +
> > + return (void *)(XENHEAP_VIRT_START + va_offset);
> > }
>
> I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when
> it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START
> in the first place: You don't have a separate "heap" virtual address
> range, do you?
The name may not be ideal for RISC-V. I borrowed it from Arm, intending
to account for cases where the directmap virtual start might not align
with DIRECTMAP_VIRT_START due to potential adjustments for superpage
mapping.
And my understanding is that XENHEAP == DIRECTMAP in case of Arm64.
Let's discuss below whether XENHEAP_VIRT_START is necessary, as there
are related questions connected to it.
>
> > @@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma)
> > */
> > static inline unsigned long virt_to_maddr(unsigned long va)
> > {
> > - if ((va >= DIRECTMAP_VIRT_START) &&
> > + if ((va >= XENHEAP_VIRT_START) &&
> > (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
> > - return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START);
> > + return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
>
> Same concern here then.
>
> > @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t
> > fdt_paddr)
> >
> > return fdt_virt;
> > }
> > +
> > +#ifndef CONFIG_RISCV_32
>
> I'd like to ask that you be more selective with this #ifdef (or omit
> it
> altogether here). setup_mm() itself, for example, looks good for any
> mode.
Regarding setup_mm() as they have pretty different implementations for
32 and 64 bit versions.
> Like does ...
>
> > +#define ROUNDDOWN(addr, size) ((addr) & ~((size) - 1))
>
> ... this #define. Then again this macro may better be placed in
> xen/macros.h anyway, next to ROUNDUP().
I will put it there. It was put in arch specific code as for such long
existence of Xen project no one introduce that so I decided that it is
only one specific case thereby no real need to go to common.
>
> > + frametable_size = ROUNDUP(frametable_size, MB(2));
> > + base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT,
> > PFN_DOWN(MB(2)));
>
> The 2Mb aspect wants a (brief) comment, imo.
>
> > + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> > + PFN_DOWN(frametable_size),
> > + PAGE_HYPERVISOR_RW) )
> > + panic("Unable to setup the frametable mappings\n");
> > +
> > + memset(&frame_table[0], 0, nr_mfns * sizeof(struct
> > page_info));
> > + memset(&frame_table[nr_mfns], -1,
> > + frametable_size - (nr_mfns * sizeof(struct
> > page_info)));
>
> Here (see comments on v1) you're still assuming ps == 0.
Do you refer to ?
```
> +/* Map a frame table to cover physical addresses ps through pe */
> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
> +{
> + unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -
This looks to be accounting for a partial page at the end.
> + mfn_x(maddr_to_mfn(ps)) + 1;
Whereas this doesn't do the same at the start. The sole present caller
passes 0, so that's going to be fine for the time being. Yet it's a
latent pitfall. I'd recommend to either drop the function parameter, or
to deal with it correctly right away.
```
And I've added aligned_ps to cover the case that ps could be not page
aligned.
Or are you refering to 0 in memset(&frame_table[0],...)?
>
> > +/* Map the region in the directmap area. */
> > +static void __init setup_directmap_mappings(unsigned long
> > base_mfn,
> > + unsigned long nr_mfns)
> > +{
> > + int rc;
> > +
> > + /* First call sets the directmap physical and virtual offset.
> > */
> > + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
> > + {
> > + directmap_mfn_start = _mfn(base_mfn);
> > +
> > + /*
> > + * The base address may not be aligned to the second level
> > + * size (e.g. 1GB when using 4KB pages). This would
> > prevent
> > + * superpage mappings for all the regions because the
> > virtual
> > + * address and machine address should both be suitably
> > aligned.
> > + *
> > + * Prevent that by offsetting the start of the directmap
> > virtual
> > + * address.
> > + */
> > + directmap_virt_start = DIRECTMAP_VIRT_START +
> > pfn_to_paddr(base_mfn);
>
> Don't you need to mask off top bits of the incoming MFN here, or else
> you
> may waste a huge part of direct map space?
Yes, it will result in a loss of direct map space, but we still have a
considerable amount available in Sv39 mode and higher modes. The
largest RAM_START I see currently is 0x1000000000, which means we would
lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still
plenty of free space available, and we can always increase
DIRECTMAP_SIZE since we have a lot of free virtual address space in
Sv39.
That said, I’m not insisting on this approach.
My suggestion was to handle the addition and subtraction of
directmap_mfn_start in maddr_to_virt() and virt_to_maddr():
```
+extern mfn_t directmap_mfn_start;
extern vaddr_t directmap_virt_start;
#define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
@@ -31,7 +32,7 @@ static inline void *maddr_to_virt(paddr_t ma)
ASSERT(va_offset < DIRECTMAP_SIZE);
- return (void *)(XENHEAP_VIRT_START + va_offset);
+ return (void *)(XENHEAP_VIRT_START -
(mfn_to_maddr(directmap_mfn_start)) + va_offset);
}
/*
@@ -44,7 +45,7 @@ static inline unsigned long virt_to_maddr(unsigned
long va)
{
if ((va >= XENHEAP_VIRT_START) &&
(va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
- return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
+ return directmapoff_to_maddr(va - XENHEAP_VIRT_START +
mfn_to_maddr(directmap_mfn_start));
BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2));
ASSERT((va >> (PAGETABLE_ORDER + PAGE_SHIFT)) ==
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 262cec811e..7ef9db2363 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -450,7 +450,7 @@ static void __init
setup_frametable_mappings(paddr_t ps, paddr_t pe)
}
-static mfn_t __ro_after_init directmap_mfn_start =
INVALID_MFN_INITIALIZER;
+mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER;
vaddr_t __ro_after_init directmap_virt_start;
/* Map the region in the directmap area. */
@@ -462,6 +462,8 @@ static void __init
setup_directmap_mappings(unsigned long base_mfn,
/* First call sets the directmap physical and virtual offset. */
if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
{
+ unsigned long mfn_gb = base_mfn & ~XEN_PT_LEVEL_SIZE(2);
+
directmap_mfn_start = _mfn(base_mfn);
/*
@@ -473,7 +475,8 @@ static void __init
setup_directmap_mappings(unsigned long base_mfn,
* Prevent that by offsetting the start of the directmap
virtual
* address.
*/
- directmap_virt_start = DIRECTMAP_VIRT_START +
pfn_to_paddr(base_mfn);
+ directmap_virt_start = DIRECTMAP_VIRT_START +
+ (base_mfn - mfn_gb) * PAGE_SIZE; /*+
pfn_to_paddr(base_mfn)*/;
```
Finally, regarding masking off the top bits of mfn, I'm not entirely
clear on how this should work. If I understand correctly, if I mask off
certain top bits in mfn, then I would need to unmask those same top
bits in maddr_to_virt() and virt_to_maddr(). Is that correct?
Another point I’m unclear on is which specific part of the top bits
should be masked.
If you could explain this to me, I would really appreciate it, and I'll
be happy to use the masking approach.
>
> > +}
> > +
> > +/*
> > + * Setup memory management
> > + *
> > + * RISC-V 64 has a large virtual address space (the minimum
> > supported
> > + * MMU mode is Sv39, which provides TBs of VA space).
>
> Is it really TBs? According to my math you'd need more than 40 bits
> to
> map a single Tb (alongside other stuff).
I accidentally calculated it as the first 40 bits (from bits 0 to 39)
due to the "39" in Sv39. However, in reality, it’s actually 39 bits
(from bits 0 to 38), so it represents less than TBs, only GBs of
virtual address space.
>
> > + */
> > +void __init setup_mm(void)
> > +{
> > + const struct membanks *banks = bootinfo_get_mem();
> > + paddr_t ram_start = INVALID_PADDR;
> > + paddr_t ram_end = 0;
> > + paddr_t ram_size = 0;
> > + unsigned int i;
> > +
> > + /*
> > + * We need some memory to allocate the page-tables used for
> > the directmap
> > + * mappings. But some regions may contain memory already
> > allocated
> > + * for other uses (e.g. modules, reserved-memory...).
> > + *
> > + * For simplicity, add all the free regions in the boot
> > allocator.
> > + */
> > + populate_boot_allocator();
> > +
> > + total_pages = 0;
> > +
> > + for ( i = 0; i < banks->nr_banks; i++ )
> > + {
> > + const struct membank *bank = &banks->bank[i];
> > + paddr_t bank_end = bank->start + bank->size;
> > +
> > + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
>
> As before - if a bank doesn't cover full pages, this may give the
> impression
> of there being more "total pages" than there are.
Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total
size of a bank is 11K, ram_size will end up being 8K, so the "total
pages" will cover less RAM than the actual size of the RAM bank.
>
> > + ram_start = min(ram_start, bank->start);
> > + ram_end = max(ram_end, bank_end);
> > +
> > + setup_directmap_mappings(PFN_DOWN(bank->start),
> > + PFN_DOWN(bank->size));
>
> Similarly I don't think this is right when both start and size aren't
> multiple of PAGE_SIZE. You may map an unsuable partial page at the
> start,
> and then fail to map a fully usable page at the end.
ram_size should be a multiple of PAGE_SIZE because we have:
ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
Do you know of any examples where bank->start isn't aligned to
PAGE_SIZE? Should be somewhere mentioned what is legal physical address
for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no
choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part
of the bank.
~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |