|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages
On Sat, 2023-02-25 at 17:53 +0000, Julien Grall wrote:
> Hi Oleksii,
>
> On 24/02/2023 15:06, Oleksii Kurochko wrote:
> > Mostly the code for setup_initial_pages was taken from Bobby's
> > repo except for the following changes:
> > * Use only a minimal part of the code enough to enable MMU
> > * rename {_}setup_initial_pagetables functions
> > * add writable argument for _setup_initial_pagetables to have
> > an opportunity to make some sections read-only
> > * update setup_initial_pagetables function to make some sections
> > read-only
> > * change the order of _setup_inital_pagetables()
> > in setup_initial_pagetable():
> > * first it is called for text, init, rodata sections
> > * after call it for ranges [link_addr_start : link_addr_end] and
> > [load_addr_start : load_addr_end]
> > Before it was done first for the ranges and after for sections
> > but
> > in that case read-only status will be equal to 'true' and
> > as sections' addresses can/are inside the ranges the read-only
> > status
> > won't be updated for them as it was set up before.
> >
> > Origin:
> > https://gitlab.com/xen-on-risc-v/xen/-/tree/riscv-rebase 4af165b468
> > af
> > Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
> > ---
> > xen/arch/riscv/Makefile | 1 +
> > xen/arch/riscv/include/asm/mm.h | 9 ++
> > xen/arch/riscv/include/asm/page.h | 90 ++++++++++++
> > xen/arch/riscv/mm.c | 223
> > ++++++++++++++++++++++++++++++
> > 4 files changed, 323 insertions(+)
> > create mode 100644 xen/arch/riscv/include/asm/mm.h
> > create mode 100644 xen/arch/riscv/include/asm/page.h
> > create mode 100644 xen/arch/riscv/mm.c
> >
> > diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
> > index 443f6bf15f..956ceb02df 100644
> > --- a/xen/arch/riscv/Makefile
> > +++ b/xen/arch/riscv/Makefile
> > @@ -1,5 +1,6 @@
> > obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> > obj-y += entry.o
> > +obj-y += mm.o
> > obj-$(CONFIG_RISCV_64) += riscv64/
> > obj-y += sbi.o
> > obj-y += setup.o
> > diff --git a/xen/arch/riscv/include/asm/mm.h
> > b/xen/arch/riscv/include/asm/mm.h
> > new file mode 100644
> > index 0000000000..fc1866b1d8
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/mm.h
> > @@ -0,0 +1,9 @@
> > +#ifndef _ASM_RISCV_MM_H
> > +#define _ASM_RISCV_MM_H
> > +
> > +void setup_initial_pagetables(unsigned long load_addr_start,
> > + unsigned long load_addr_end,
> > + unsigned long linker_addr_start,
> > + unsigned long linker_addr_end);
> > +
> > +#endif /* _ASM_RISCV_MM_H */
> > diff --git a/xen/arch/riscv/include/asm/page.h
> > b/xen/arch/riscv/include/asm/page.h
> > new file mode 100644
> > index 0000000000..fabbe1305f
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/page.h
> > @@ -0,0 +1,90 @@
> > +#ifndef _ASM_RISCV_PAGE_H
> > +#define _ASM_RISCV_PAGE_H
> > +
> > +#include <xen/const.h>
> > +#include <xen/types.h>
> > +
> > +#define PAGE_ENTRIES 512
>
> NIT: AFAIU, the number here is based on ...
>
> > +#define VPN_BITS (9)
>
> ... this. So I would suggest to define PAGE_ENTRIES using VPN_BITS.
Sure. It should be defined using VPN_BITS. Thanks.
>
> > +#define VPN_MASK ((unsigned long)((1 << VPN_BITS) -
> > 1))
> NIT: Use 1UL and you can avoid the cast.
Thanks. I'll update that in the next version of patch series.
>
> > +
> > +#ifdef CONFIG_RISCV_64
> > +/* L3 index Bit[47:39] */
> > +#define THIRD_SHIFT (39)
> > +#define THIRD_MASK (VPN_MASK << THIRD_SHIFT)
> > +/* L2 index Bit[38:30] */
> > +#define SECOND_SHIFT (30)
> > +#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
> > +/* L1 index Bit[29:21] */
> > +#define FIRST_SHIFT (21)
> > +#define FIRST_MASK (VPN_MASK << FIRST_SHIFT)
> > +/* L0 index Bit[20:12] */
> > +#define ZEROETH_SHIFT (12)
> > +#define ZEROETH_MASK (VPN_MASK << ZEROETH_SHIFT)
>
> On Arm, we are trying to phase out ZEROETH_* and co because the name
> is
> too generic. Instead, we now introduce a generic macro that take a
> level
> and then compute the mask/shift (see XEN_PT_LEVEL_*).
>
> You should be able to do in RISC-V and reduce the amount of defines
> introduced.
Thanks. I'll look at XEN_PT_LEVEL_*. I'll re-read Andrew's comment but
as far as I understand after quick reading we can remove mostly that.
>
> > +
> > +#else // CONFIG_RISCV_32
>
> Coding style: comments in Xen are using /* ... */
>
> > +
> > +/* L1 index Bit[31:22] */
> > +#define FIRST_SHIFT (22)
> > +#define FIRST_MASK (VPN_MASK << FIRST_SHIFT)
> > +
> > +/* L0 index Bit[21:12] */
> > +#define ZEROETH_SHIFT (12)
> > +#define ZEROETH_MASK (VPN_MASK << ZEROETH_SHIFT)
> > +#endif
> > +
> > +#define THIRD_SIZE (1 << THIRD_SHIFT)
> > +#define THIRD_MAP_MASK (~(THIRD_SIZE - 1))
> > +#define SECOND_SIZE (1 << SECOND_SHIFT)
> > +#define SECOND_MAP_MASK (~(SECOND_SIZE - 1))
> > +#define FIRST_SIZE (1 << FIRST_SHIFT)
> > +#define FIRST_MAP_MASK (~(FIRST_SIZE - 1))
> > +#define ZEROETH_SIZE (1 << ZEROETH_SHIFT)
> > +#define ZEROETH_MAP_MASK (~(ZEROETH_SIZE - 1))
> > +
> > +#define PTE_SHIFT 10
> > +
> > +#define PTE_VALID BIT(0, UL)
> > +#define PTE_READABLE BIT(1, UL)
> > +#define PTE_WRITABLE BIT(2, UL)
> > +#define PTE_EXECUTABLE BIT(3, UL)
> > +#define PTE_USER BIT(4, UL)
> > +#define PTE_GLOBAL BIT(5, UL)
> > +#define PTE_ACCESSED BIT(6, UL)
> > +#define PTE_DIRTY BIT(7, UL)
> > +#define PTE_RSW (BIT(8, UL) | BIT(9, UL))
> > +
> > +#define PTE_LEAF_DEFAULT (PTE_VALID | PTE_READABLE |
> > PTE_WRITABLE | PTE_EXECUTABLE)
>
> We should avoid vulnerable default flags. So this should either be RW
> or RX.
Thanks. I'll take it into account.
>
> > +#define PTE_TABLE (PTE_VALID)
> > +
> > +/* Calculate the offsets into the pagetables for a given VA */
> > +#define zeroeth_linear_offset(va) ((va) >> ZEROETH_SHIFT)
> > +#define first_linear_offset(va) ((va) >> FIRST_SHIFT)
> > +#define second_linear_offset(va) ((va) >> SECOND_SHIFT)
> > +#define third_linear_offset(va) ((va) >> THIRD_SHIFT)
> > +
> > +#define pagetable_zeroeth_index(va) zeroeth_linear_offset((va) &
> > ZEROETH_MASK)
> > +#define pagetable_first_index(va) first_linear_offset((va) &
> > FIRST_MASK)
> > +#define pagetable_second_index(va) second_linear_offset((va) &
> > SECOND_MASK)
> > +#define pagetable_third_index(va) third_linear_offset((va) &
> > THIRD_MASK)
> > +
> > +/* Page Table entry */
> > +typedef struct {
> > + uint64_t pte;
> > +} pte_t;
> > +
> > +/* Shift the VPN[x] or PPN[x] fields of a virtual or physical
> > address
> > + * to become the shifted PPN[x] fields of a page table entry */
> > +#define addr_to_ppn(x) (((x) >> PAGE_SHIFT) << PTE_SHIFT)
> > +
> > +static inline pte_t paddr_to_pte(unsigned long paddr)
> > +{
> > + return (pte_t) { .pte = addr_to_ppn(paddr) };
> > +}
> > +
> > +static inline bool pte_is_valid(pte_t *p)
> > +{
> > + return p->pte & PTE_VALID;
> > +}
> > +
> > +#endif /* _ASM_RISCV_PAGE_H */
> > diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
> > new file mode 100644
> > index 0000000000..6e172376eb
> > --- /dev/null
> > +++ b/xen/arch/riscv/mm.c
> > @@ -0,0 +1,223 @@
> > +#include <xen/init.h>
> > +#include <xen/lib.h>
> > +
> > +#include <asm/csr.h>
> > +#include <asm/mm.h>
> > +#include <asm/page.h>
> > +
> > +/*
> > + * xen_second_pagetable is indexed with the VPN[2] page table
> > entry field
> > + * xen_first_pagetable is accessed from the VPN[1] page table
> > entry field
> > + * xen_zeroeth_pagetable is accessed from the VPN[0] page table
> > entry field
> > + */
> > +pte_t xen_second_pagetable[PAGE_ENTRIES]
> > __attribute__((__aligned__(PAGE_SIZE)));
> > +static pte_t xen_first_pagetable[PAGE_ENTRIES]
> > + __attribute__((__aligned__(PAGE_SIZE)));
> > +static pte_t xen_zeroeth_pagetable[PAGE_ENTRIES]
> > + __attribute__((__aligned__(PAGE_SIZE)));
> > +
> > +extern unsigned long _stext;
> > +extern unsigned long _etext;
> > +extern unsigned long __init_begin;
> > +extern unsigned long __init_end;
> > +extern unsigned long _srodata;
> > +extern unsigned long _erodata;
> > +
> > +paddr_t phys_offset;
>
> This is defined, set but not used.
>
> > +
> > +#define resolve_early_addr(x) \
>
> This helper seems to behave the same wasy as linker_addr(). So any
> reason to not use it?
linker_addr() script can be used instead. It looks I missed something
before and it is spilled out into two equal macros.
>
> I will make this assumption this can be used and not comment on the
> implement of resolve_early_addr().
>
> > +
> > ({
> > \
> > + unsigned long *
> > __##x; \
> > + if ( load_addr_start <= x && x < load_addr_end
> > ) \
> > + __##x = (unsigned long
> > *)x; \
> > +
> > else
> > \
> > + __##x = (unsigned long *)(x + load_addr_start -
> > linker_addr_start); \
> > +
> > __##x;
> > \
> > + })
> > +
> > +static void __init clear_pagetables(unsigned long load_addr_start,
> > + unsigned long load_addr_end,
> > + unsigned long linker_addr_start,
> > + unsigned long linker_addr_end)
> > +{
> > + unsigned long *p;
> > + unsigned long page;
> > + unsigned long i;
> > +
> > + page = (unsigned long)&xen_second_pagetable[0];
> > +
> > + p = resolve_early_addr(page);
> > + for ( i = 0; i < ARRAY_SIZE(xen_second_pagetable); i++ )
>
> The entries in xen_second_pagetable are a pte_t (uint64_t). But ...
>
> > + {
> > + p[i] = 0ULL;
>
> ... the type here will be unsigned long. So you may not fully zero
> the
> page-table on 32-bit architecture. Therefore you want to define as
> pte_t.
>
> That said, given the page table will be part of BSS, you should not
> need
> to zero again assuming you clear BSS before hand.
>
> If you clear afterwards, then you *must* move them out of BSS.
>
> The same applies for xen_{first, zeroeth}_pagetable below.
I didn't have initialized page tables so that is why I needed
clear_pagetables() but I think you are right and clear_pagetables can
be removed at all as page tables will be initialized during BSS
initialization.
>
> > + }
> > +
> > + page = (unsigned long)&xen_first_pagetable[0];
> > + p = resolve_early_addr(page);
> > + for ( i = 0; i < ARRAY_SIZE(xen_first_pagetable); i++ )
> > + {
> > + p[i] = 0ULL;
> > + }
> > +
> > + page = (unsigned long)&xen_zeroeth_pagetable[0];
> > + p = resolve_early_addr(page);
> > + for ( i = 0; i < ARRAY_SIZE(xen_zeroeth_pagetable); i++ )
> > + {
> > + p[i] = 0ULL;
> > + }
> > +}
> > +
> > +/*
> > + * WARNING: load_addr() and linker_addr() are to be called only
> > when the MMU is
> > + * disabled and only when executed by the primary CPU. They
> > cannot refer to
> > + * any global variable or functions.
>
> I find interesting you are saying when _setup_initial_pagetables() is
> called from setup_initial_pagetables(). Would you be able to explain
> how
> this is different?
I am not sure that I understand your question correctly but
_setup_initial_pagetables() was introduced to map some addresses with
write/read flag. Probably I have to rename it to something that is more
clear.
>
> > + */
> > +
> > +/*
> > + * Convert an addressed layed out at link time to the address
> > where it was loaded
>
> Typo: s/addressed/address/ ?
Yes, it should be address. and 'layed out' should be changed to 'laid
out'...
>
> > + * by the bootloader.
> > + */
>
> Looking at the implementation, you seem to consider that any address
> not
> in the range [linker_addr_start, linker_addr_end[ will have a 1:1
> mappings.
>
> I am not sure this is what you want. So I would consider to throw an
> error if such address is passed.
I thought that at this stage and if no relocation was done it is 1:1
except the case when load_addr_start != linker_addr_start.
>
> > +#define
> > load_addr(linker_address)
> > \
> > +
> > ({
> > \
> > + unsigned long __linker_address = (unsigned
> > long)(linker_address); \
> > + if ( linker_addr_start <= __linker_address
> > && \
> > + __linker_address < linker_addr_end
> > ) \
> > +
> > {
> > \
> > + __linker_address
> > = \
> > + __linker_address - linker_addr_start +
> > load_addr_start; \
> > +
> > }
> > \
> > +
> > __linker_address;
> > \
> > + })
> > +
> > +/* Convert boot-time Xen address from where it was loaded by the
> > boot loader to the address it was layed out
> > + * at link-time.
> > + */
>
> Coding style: The first line is too long and multi-line comments look
> like:
>
> /*
> * Foo
> * Bar
> */
>
> > +#define
> > linker_addr(load_address)
> > \
>
> Same remark as for load_addr() above.
>
> > +
> > ({
> > \
> > + unsigned long __load_address = (unsigned
> > long)(load_address); \
> > + if ( load_addr_start <= __load_address
> > && \
> > + __load_address < load_addr_end
> > ) \
> > +
> > {
> > \
> > + __load_address
> > = \
> > + __load_address - load_addr_start +
> > linker_addr_start; \
> > +
> > }
> > \
> > +
> > __load_address;
> > \
> > + })
> > +
> > +static void __attribute__((section(".entry")))
> > +_setup_initial_pagetables(pte_t *second, pte_t *first, pte_t
> > *zeroeth,
> Can this be named to setup_initial_mapping() so this is clearer and
> avoid the one '_' different with the function below.
Sure. It will be better.
>
> > + unsigned long map_start,
> > + unsigned long map_end,
> > + unsigned long pa_start,
> > + bool writable)
>
> What about the executable bit?
It's always executable... But as you mentioned above PTE_LEAF_DEFAULT
should be either RX or RW.
I think it makes sense to add flags instead of writable.
>
> > +{
> > + unsigned long page_addr;
> > + unsigned long index2;
> > + unsigned long index1;
> > + unsigned long index0;
>
> index* could be defined in the loop below.
It could. But I am curious why it is better?
>
> > +
> > + /* align start addresses */
> > + map_start &= ZEROETH_MAP_MASK;
> > + pa_start &= ZEROETH_MAP_MASK;
>
> Hmmm... I would actually expect the address to be properly aligned
> and
> therefore not require an alignment here.
>
> Otherwise, this raise the question of what happen if you have region
> using the same page?
That map_start &= ZEROETH_MAP_MASK is needed to page number of address
w/o page offset.
>
> > +
> > + page_addr = map_start;
> > + while ( page_addr < map_end )
>
> Looking at the loop, it looks like you are assuming that the region
> will
> never cross a boundary of a page-table (either L0, L1, L2). I am not
> convinced you can make such assumption (see below).
>
> But if you really want to make such assumption then you should add
> some
> guard (either BUILD_BUG_ON(), ASSERT(), proper check) in your code to
> avoid any surprise in the future.
I am not sure that I fully understand what is the problem here.
The address is aligned on (1<<12) boundary and each itearation is
mapped (1<<12) page so all looks fine or I misunderstood you.
>
> > + {
> > + index2 = pagetable_second_index(page_addr);
> > + index1 = pagetable_first_index(page_addr);
> > + index0 = pagetable_zeroeth_index(page_addr);
> > +
> > + /* Setup level2 table */
> > + second[index2] = paddr_to_pte((unsigned long)first);
> > + second[index2].pte |= PTE_TABLE;
> > +
> > + /* Setup level1 table */
> > + first[index1] = paddr_to_pte((unsigned long)zeroeth);
> > + first[index1].pte |= PTE_TABLE;
> > +
> > + /* Setup level0 table */
> > + if ( !pte_is_valid(&zeroeth[index0]) )
>
> Can you explain why you are checking !pte_is_valid() for the L0 entry
> but not the other?
My mistake it should be checked for each level.
>
> > + {
> > + /* Update level0 table */
> > + zeroeth[index0] = paddr_to_pte((page_addr - map_start)
> > + pa_start);
> > + zeroeth[index0].pte |= PTE_LEAF_DEFAULT;
> > + zeroeth[index0].pte &= ~((!writable) ? PTE_WRITABLE :
> > 0);
>
> Looking at the default value, it would mean that a non-writable
> mapping
> is automatically executable. This seems wrong for the section is not
> meant to be executable (like rodata).
Yes, you are right. I'll reowrk setup_initial_mapping() to pass flags
instead of write/read - only flag.
>
> > + }
> > +
> > + /* Point to next page */
> > + page_addr += ZEROETH_SIZE;
> > + }
> > +}
> > +
> > +/*
> > + * setup_initial_pagetables:
> > + *
> > + * 1) Build the page tables for Xen that map the following:
> > + * 1.1) The physical location of Xen (where the bootloader
> > loaded it)
> > + * 1.2) The link-time location of Xen (where the linker
> > expected Xen's
> > + * addresses to be)
> > + * 2) Load the page table into the SATP and enable the MMU
> > + */
> > +void __attribute__((section(".entry")))
>
> I couldn't find a section ".entry" in the linker.
>
> > +setup_initial_pagetables(unsigned long load_addr_start,
> > + unsigned long load_addr_end,
> > + unsigned long linker_addr_start,
> > + unsigned long linker_addr_end)
> > +{
> > + pte_t *second;
> > + pte_t *first;
> > + pte_t *zeroeth;
> > +
> > + clear_pagetables(load_addr_start, load_addr_end,
> > + linker_addr_start, linker_addr_end);
> > +
> > + /* Get the addresses where the page tables were loaded */
> > + second = (pte_t *)load_addr(&xen_second_pagetable);
> > + first = (pte_t *)load_addr(&xen_first_pagetable);
> > + zeroeth = (pte_t *)load_addr(&xen_zeroeth_pagetable);
>
> I would consider to embed the type cast in load_addr() so you are
> adding
> some type safety within your code.
Definitely it will be better but setup_initial_mapping() uses 'unsigned
long' for passing address that should be mapped.
>
> > +
> > + /*
> > + * Create a mapping from Xen's link-time addresses to where
> > they were actually loaded.
>
> This is line is way long than 80 characters. Please make sure to wrap
> it
> 80 characters.
>
> > + */
> > + _setup_initial_pagetables(second, first, zeroeth,
> > + linker_addr(&_stext),
> > + linker_addr(&_etext),
> > + load_addr(&_stext),
> > + false);
> > + _setup_initial_pagetables(second, first, zeroeth,
> > + linker_addr(&__init_begin),
> > + linker_addr(&__init_end),
> > + load_addr(&__init_begin),
> > + true);
> > + _setup_initial_pagetables(second, first, zeroeth,
> > + linker_addr(&_srodata),
> > + linker_addr(&_erodata),
> > + load_addr(&_srodata),
> > + false);
> > + _setup_initial_pagetables(second, first, zeroeth,
> > + linker_addr_start,
> > + linker_addr_end,
> > + load_addr_start,
> > + true);
>
> Where do you guarantee that Xen will always fit in an L0 table and
> the
> start address is aligned to the size of an L0 table?
I don't guarantee that it fit in an L0 table but the start address is
aligned to the size of the L0 table at the start.
>
> > +
> > + /*
> > + * Create a mapping of the load time address range to... the
> > load time address range.
>
> Same about the line length here.
>
> > + * This mapping is used at boot time only.
> > + */
> > + _setup_initial_pagetables(second, first, zeroeth,
>
> This can only work if Xen is loaded at its linked address. So you
> need a
> separate set of L0, L1 tables for the identity mapping.
>
> That said, this would not be sufficient because:
> 1) Xen may not be loaded at a 2M boundary (you can control with
> U-boot, but not with EFI). So this may cross a boundary and therefore
> need multiple pages.
> 2) The load region may overlap the link address
>
> While I think it would be good to handle those cases from the start,
> I
> would understand why are not easy to solve. So I think the minimum is
> to
> throw some errors if you are in a case you can't support.
Do you mean to throw some error in load_addr()/linkder_addr()?
>
> > + load_addr_start,
> > + load_addr_end,
> > + load_addr_start,
> > + true); > +
> > + /* Ensure page table writes precede loading the SATP */
> > + asm volatile("sfence.vma");
> > +
> > + /* Enable the MMU and load the new pagetable for Xen */
> > + csr_write(CSR_SATP,
> > + (load_addr(xen_second_pagetable) >> PAGE_SHIFT) |
> > SATP_MODE_SV39 << SATP_MODE_SHIFT);
>
> IHMO, it would make sense to introduce within the series the code to
> jump off the identity mapping and then remove it.
Probably I have to spend more time to understand identity mapping but
it looks like current one RISC-V port from Bobby's doesn't jump off
and remove the identity mapping.
>
> > +
> > + phys_offset = load_addr_start - linker_addr_start;
> > +
> > + return;
> > +}
>
> Cheers,
>
~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |