Xen project Mailing List

Re: [Xen-devel] [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map

To: Roger Pau Monne <roger.pau@xxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 4 Aug 2016 19:43:39 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Thu, 04 Aug 2016 18:44:02 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 02/08/16 10:19, Roger Pau Monne wrote: > On Fri, Jul 29, 2016 at 08:04:12PM +0100, Andrew Cooper wrote: >> On 29/07/16 17:29, Roger Pau Monne wrote: >>> Craft the Dom0 e820 memory map and populate it. >>> >>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >>> --- >>> Cc: Jan Beulich <jbeulich@xxxxxxxx> >>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>> --- >>> xen/arch/x86/domain_build.c | 199 >>> ++++++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 193 insertions(+), 6 deletions(-) >>> >>> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c >>> index c0ef40f..cb8ecbd 100644 >>> --- a/xen/arch/x86/domain_build.c >>> +++ b/xen/arch/x86/domain_build.c >>> @@ -43,6 +43,11 @@ static long __initdata dom0_nrpages; >>> static long __initdata dom0_min_nrpages; >>> static long __initdata dom0_max_nrpages = LONG_MAX; >>> >>> +/* GFN of the identity map for EPT. */ >>> +#define HVM_IDENT_PT_GFN 0xfeffeu >> The IDENT_PT is only needed on Gen1 VT-x hardware which doesn't support >> unrestricted-guest mode. OTOH, it needs to be matched with VM86_TSS, >> which is also needed if hardware doesn't support unrestricted-guest. >> Also, the IDENT_PT can live anywhere in GFN space. 0xfeffe is just >> convention for HVM guests as nothing else interesting lives there, but a >> PVH dom0 will want to ensure it doesn't alias anything interesting it >> might wish to map. >> >> Now I look at it, there is substantial WTF. The domain builder makes >> IDENT_PT, but HVMLoader makes VM86_TSS. I presume this is a historical >> side effect of IDENT_PT being a prerequisite to even executing >> hvmloader, while VM86_TSS was only a prerequisite to executing the >> rombios payload. Either way, eww :( >> >> I think the VM86_TSS setup needs to move to unconditionally being beside >> IDENT_PT setup in the domain builder, and mirrored here in dom0_hvm() >> creation. I don't think it is reasonable to expect an HVMLite payload >> to set up its own VM86_TSS if it didn't set up IDENT_PT. (OTOH, the >> chances of an HVMLite guest ever needing to return to real mode is slim, >> but in the name of flexibility, it would be nice not to rule it out). >> >> Longterm, it would be nice to disentangle the unrestricted-guest support >> from the main vmx code, and make it able to be compiled out. There is >> lots of it, and it all-but-dead on modern hardware. > Thanks! I didn't know anything about the VM86 TSS stuff, the fact that the > identity page tables and the VM86 TSS are set at completely different points > makes it quite hard to follow :/. > > I've now moved the setup of the VM86 TSS inside the domain builder for both > DomU and Dom0. Thanks. (Guess who has recently had to delve back into this code^W swamp.) > >>> +} >>> + >>> +/* Calculate the biggest usable order given a size in bytes. */ >>> +static inline unsigned int get_order(uint64_t size) >>> +{ >>> + unsigned int order; >>> + uint64_t pg; >>> + >>> + ASSERT((size & ~PAGE_MASK) == 0); >>> + pg = PFN_DOWN(size); >>> + for ( order = 0; pg >= (1 << (order + 1)); order++ ); >>> + >>> + return order; >>> +} >> We already have get_order_from_bytes() and get_order_from_pages(), the >> latter of which looks like it will suit your usecase. > Not really, or at least they don't do the same as get_order. This function > calculates the maximum order you can use so that there are no pages left > over, (ie: if you have a size of 3145728bytes (3MiB), this function will > return order 9 (2MiB), while the other ones will return order 10 (4MiB)). I > don't really understand while other places in code request bigger orders and > then free the leftovers, isn't this also causing memory shattering? Sounds like we want something like get_order_{floor,ceil}() which makes it obvious which way non-power-of-two get rounded. > >> As a TODO item, I really would like to borrow some of the Linux >> infrastructure to calculate orders of constants at compile time, because >> a large number of callers of the existing get_order_*() functions are >> performing unnecessary runtime calculation. For those which need >> runtime calculation, some careful use of ffs() would be preferable to a >> loop. >> >>> + >>> +/* Populate an HVM memory range using the biggest possible order. */ >>> +static void __init hvm_populate_memory_range(struct domain *d, uint64_t >>> start, >>> + uint64_t size) >>> +{ >>> + static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node; >>> + unsigned int order; >>> + struct page_info *page; >>> + int rc; >>> + >>> + ASSERT((size & ~PAGE_MASK) == 0 && (start & ~PAGE_MASK) == 0); >>> + >>> + order = MAX_ORDER; >>> + while ( size != 0 ) >>> + { >>> + order = min(get_order(size), order); >>> + page = alloc_domheap_pages(d, order, memflags); >>> + if ( page == NULL ) >>> + { >>> + if ( order == 0 && memflags ) >>> + { >>> + /* Try again without any memflags. */ >>> + memflags = 0; >>> + order = MAX_ORDER; >>> + continue; >>> + } >>> + if ( order == 0 ) >>> + panic("Unable to allocate memory with order 0!\n"); >>> + order--; >>> + continue; >>> + } >> It would be far more efficient to try and allocate only 1G and 2M >> blocks. Most of memory is free at this point, and it would allow the >> use of HAP superpage mappings, which will be a massive performance boost >> if they aren't shattered. > That's what I'm trying to do, but we might have to use pages of lower order > when filling the smaller gaps. As a general principle, we should try not to have any gaps. This also extends to guests using more intelligence when deciding now to mutate its physmap. > As an example, this are the stats when > building a domain with 6048M of RAM: > > (XEN) Memory allocation stats: > (XEN) Order 18: 5GB > (XEN) Order 17: 512MB > (XEN) Order 15: 256MB > (XEN) Order 14: 128MB > (XEN) Order 12: 16MB > (XEN) Order 10: 8MB > (XEN) Order 9: 4MB > (XEN) Order 8: 2MB > (XEN) Order 7: 1MB > (XEN) Order 6: 512KB > (XEN) Order 5: 256KB > (XEN) Order 4: 128KB > (XEN) Order 3: 64KB > (XEN) Order 2: 32KB > (XEN) Order 1: 16KB > (XEN) Order 0: 4KB > > IMHO, they are quite good. What are the RAM characteristics of the host? Do you have any idea what the hap superpage characteristics are like after the guest has booted? In a case like this, I think it would be entirely reasonable to round up to the nearest 2MB, and avoid all of those small page mappings. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.