[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/PV: assert page state in mark_pv_pt_pages_rdonly()
On 17/08/2021 09:54, Jan Beulich wrote: > On 16.08.2021 21:25, Andrew Cooper wrote: >> On 16/08/2021 16:29, Jan Beulich wrote: >>> About every time I look at dom0_construct_pv()'s "calculation" of >>> nr_pt_pages I question (myself) whether the result is precise or merely >>> an upper bound. I think it is meant to be precise, but I think we would >>> be better off having some checking in place. Hence add ASSERT()s to >>> verify that >>> - all pages have a valid L1...Ln (currently L4) page table type and >>> - no other bits are set, in particular the type refcount is still zero. >>> >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>> --- >>> There are (at least) two factors supporting my uncertainty about the >>> "calculation" being precise: The loop starting from 2 (which clearly is >>> too small for a possible result) >> 2 was the correct absolute minimum for 2-level guests. > Which has been history for how many years? Yeah, but I'd expect to phrase that as "a remnant of 32bit non-PAE guests". > The minimum for the current implementation is 4 afaict, I'm not sure the monitor table for PV32 is intended to count. > and ... > >> XTF kernels don't exceed the 2M boundary (at least, not currently), so >> they can be mapped with only 3 or 4 pagetables, except: >> >> * 3-level guests are created with 4 L2's for no obvious reason. This is >> nothing to do with legacy PAE paging, nor with how a typical Linux/BSD >> kernel works. The requirement to make 3-level guests work (and even >> then, only under 32bit Xen) is to create a PGT_pae_xen_l2 if not already >> covered by the other mappings. Any non-toy kernel discards these >> pagetables in favour of its own idea of pagetables. > ... could be 3 for 32-bit Dom0. Right, and starting from (compat ? 6 : 4) is probably a good move, along with an explanation of these totally magic numbers. And now I've thought about this some more, I'm pretty certain we do better than an "start at 2, inc 1 and retry" loop. We can calculate the pagetables needed for the virtual layout statically, possibly even including the default 6/4, and use that for a lower bound. At most, we need to loop once per possibly-unpopulated pagetable level (so 1 for 32bit, 3 for 64bit) to cover the cases of extra pagetables tipping over a page size boundary. > >> * v_end is rounded up to 4MB. >> >> Most XTF guests will operate entirely happily in a few hundred kb of >> space, and the same will be true of other microservices. The rounding >> up of memory might be helpful for the traditional big VMs case, but it >> isn't correct or useful for other usecases. >> >>> and an apparently wrong comment stating >>> that not only v_end but also v_start would be superpage aligned >> Which comment? The only one I see about 4M has nothing to do with >> superpages. > The one immediately ahead of the related variable declarations: > > /* > * This fully describes the memory layout of the initial domain. All > * *_start address are page-aligned, except v_start (and v_end) which are > * superpage-aligned. > */ > > I see nothing forcing v_start to be superpage-aligned, while I > do suspect that the "calculation" of the number of page tables > will be wrong when it isn't. This is an XTF test booting as dom0 (d2) (XEN) *** Building a PV Dom0 *** (d2) (XEN) ELF: phdr: paddr=0x100000 memsz=0x12000 (d2) (XEN) ELF: memory: 0x100000 -> 0x112000 (d2) (XEN) ELF: note: GUEST_OS = "XTF" (d2) (XEN) ELF: note: GUEST_VERSION = "0" (d2) (XEN) ELF: note: LOADER = "generic" (d2) (XEN) ELF: note: HYPERCALL_PAGE = 0x106000 (d2) (XEN) ELF: note: XEN_VERSION = "xen-3.0" (d2) (XEN) ELF: note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb" (d2) (XEN) ELF: note: PAE_MODE = "yes" (d2) (XEN) ELF: using notes from SHT_NOTE section (d2) (XEN) ELF: VIRT_BASE unset, using 0 (d2) (XEN) ELF_PADDR_OFFSET unset, using 0 (d2) (XEN) ELF: addresses: (d2) (XEN) virt_base = 0x0 (d2) (XEN) elf_paddr_offset = 0x0 (d2) (XEN) virt_offset = 0x0 (d2) (XEN) virt_kstart = 0x100000 (d2) (XEN) virt_kend = 0x112000 (d2) (XEN) virt_entry = 0x100000 (d2) (XEN) p2m_base = 0xffffffffffffffff (d2) (XEN) Xen kernel: 64-bit, lsb, compat32 (d2) (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x100000 -> 0x112000 (d2) (XEN) PHYSICAL MEMORY ARRANGEMENT: (d2) (XEN) Dom0 alloc.: 000000003e800000->000000003ec00000 (240731 pages to be allocated) (d2) (XEN) VIRTUAL MEMORY ARRANGEMENT: (d2) (XEN) Loaded kernel: 0000000000100000->0000000000112000 (d2) (XEN) Init. ramdisk: 0000000000112000->0000000000112000 (d2) (XEN) Phys-Mach map: 0000000000112000->00000000002ea2d8 (d2) (XEN) Start info: 00000000002eb000->00000000002eb4b8 (d2) (XEN) Xenstore ring: 0000000000000000->0000000000000000 (d2) (XEN) Console ring: 0000000000000000->0000000000000000 (d2) (XEN) Page tables: 00000000002ec000->00000000002f1000 (d2) (XEN) Boot stack: 00000000002f1000->00000000002f2000 (d2) (XEN) TOTAL: 0000000000000000->0000000000400000 (d2) (XEN) ENTRY ADDRESS: 0000000000100000 It would appear that v_start comes directly and unmodified from the VIRT_BASE ELF note. Other observations: The ramdisk start/end aren't zero despite one being absent, and the M2P and start info ends aren't aligned. >>> (in fact >>> v_end is 4MiB aligned, which is the superpage size only on long >>> abandoned [by us] non-PAE x86-32). >> Tangentially, that code needs some serious work to use ROUNDUP/DOWN >> macros for clarity. > Agreed. > >>> --- a/xen/arch/x86/pv/dom0_build.c >>> +++ b/xen/arch/x86/pv/dom0_build.c >>> @@ -59,6 +59,10 @@ static __init void mark_pv_pt_pages_rdon >>> l1e_remove_flags(*pl1e, _PAGE_RW); >>> page = mfn_to_page(l1e_get_mfn(*pl1e)); >>> >>> + ASSERT(page->u.inuse.type_info & PGT_type_mask); >>> + ASSERT((page->u.inuse.type_info & PGT_type_mask) <= >>> PGT_root_page_table); >> This is an obfuscated >> >> ASSERT((page->u.inuse.type_info & PGT_type_mask) >= PGT_l1_page_table && >> (page->u.inuse.type_info & PGT_type_mask) <= PGT_root_page_table); > I can certainly switch to this yet longer piece of code, Improved clarity is substantially more important than conciseness. > and ... > >> and >> >>> + ASSERT(!(page->u.inuse.type_info & ~PGT_type_mask)); >> this has no context. >> >> At a bare minimum, you need a comment stating what properties we're >> looking for, so anyone suffering an assertion failure has some clue as >> to what may have gone wrong. > ... I can certainly transform the respective parts of the > description into a code comment. Thanks. It doesn't need to be much, but it does need to be something. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |