Xen project Mailing List

Re: [Xen-devel] [PATCH v5 15/16] x86: make Xen early boot code relocatable

On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote: > >>> On 20.08.16 at 00:43, <daniel.kiper@xxxxxxxxxx> wrote: > > Every multiboot protocol (regardless of version) compatible image must > > specify its load address (in ELF or multiboot header). Multiboot protocol > > compatible loader have to load image at specified address. However, there > > is no guarantee that the requested memory region (in case of Xen it starts > > at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM > > and it is free (legacy BIOS platforms are merciful for Xen but I found at > > least one EFI platform on which Xen load address conflicts with EFI boot > > services; it is Dell PowerEdge R820 with latest firmware). To cope with that > > problem we must make Xen early boot code relocatable and help boot loader to > > relocate image in proper way by suggesting, not requesting specific load > > addresses as it is right now, allowed address ranges. This patch does > > former. > > It does not add multiboot2 protocol interface which is done in "x86: add > > multiboot2 protocol support for relocatable images" patch. > > > > This patch changes following things: > > - default load address is changed from 1 MiB to 2 MiB; I did that because > > initial page tables are using 2 MiB huge pages and this way required > > updates for them are quite easy; it means that e.g. we avoid spacial > > cases for start and end of required memory region if it live at address > > not aligned to 2 MiB, > > Should this be a separate change then, to separate possible > regressions due to that from such due to any other of the changes Potentially yes. However, it should be done properly. Otherwise in case of revert we can break Xen relocatable infrastructure and other things. So, I am not sure does it pays. Anyway, I will check is it possible or not. > here? And then I don't see why, when making the image relocatable > anyway, the link time load address actually still matters. It matters for multiboot (v1) and multiboot2 compatible boot loaders not supporting relocatable images. > > - %esi and %r15d registers are used as a storage for Xen image load base > > address (%r15d shortly because %rsi is used for EFI SystemTable address > > in 64-bit code); both registers are (%esi is mostly) unused in early > > boot code and preserved during C functions calls, > > In a context where you (also) talk about 64-bit code, %esi can't > be called preserved unilaterally. You should make clear that this is > for 32-bit function calls. OK. > > --- a/xen/arch/x86/boot/head.S > > +++ b/xen/arch/x86/boot/head.S > > @@ -12,13 +12,16 @@ > > .text > > .code32 > > > > -#define sym_phys(sym) ((sym) - __XEN_VIRT_START) > > +#define sym_offset(sym) ((sym) - __XEN_VIRT_START) > > In a patch already large and hence hard to review, did you really > need to do this rename? I think that sym_offset() is better term here. However, if you wish I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests that we are playing with physical addresses and it is not correct in all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish). > > @@ -126,26 +130,26 @@ vga_text_buffer: > > .section .init.text, "ax", @progbits > > > > bad_cpu: > > - mov $(sym_phys(.Lbad_cpu_msg)),%esi # Error message > > + lea esi_offset(.Lbad_cpu_msg),%esi # Error message > > Why not just > > add $sym_offset(.Lbad_cpu_msg),%esi # Error message OK. [...] > > @@ -461,62 +545,88 @@ trampoline_setup: > > > > /* Stash TSC to calculate a good approximation of time-since-boot > > */ > > rdtsc > > - mov %eax,sym_phys(boot_tsc_stamp) > > - mov %edx,sym_phys(boot_tsc_stamp+4) > > + mov %eax,fs_offset(boot_tsc_stamp) > > + mov %edx,fs_offset(boot_tsc_stamp)+4 > > + > > + /* Update frame addresses in page tables. */ > > + mov $((__page_tables_end-__page_tables_start)/8),%ecx > > +1: testl $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8) > > + jz 2f > > + add %esi,fs_offset(__page_tables_start)-8(,%ecx,8) > > +2: loop 1b > > This loop includes the mapping of the low Mb, which I don't think it > should modify. Or wait - you move __page_tables_start, which by > itself is fragile (but looks to be acceptable if accompanied by a > comment explaining why it doesn't cover l1_identmap). OK, I will add relevant comment somewhere. > Also, just to double check - all these page table setup adjustments > don't require reflecting in xen.efi's page table setup code? I will check it once again but I do not think so. [...] > > --- a/xen/arch/x86/boot/trampoline.S > > +++ b/xen/arch/x86/boot/trampoline.S > > @@ -54,12 +54,20 @@ trampoline_gdt: > > /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */ > > .long 0x0000ffff > > .long 0x00009200 > > + /* > > + * 0x0030: ring 0 Xen data, 16 MiB size, base > > + * address is computed during runtime. > > + */ > > + .quad 0x00c0920000001000 > > This does not look like it covers 1Mb, it's more like 1Mb+4k-1. I have checked it once again. It covers 16 MiB as required: 4 KiB * 0x1000 = 0x1000000 (no taking into account relocation). By the way, it looks that we can define this segment as 0x200000 - 0x1000000. However, then sym_fs() should be defined as: #define sym_fs(sym) %fs:(sym_offset(sym) - XEN_IMG_OFFSET) Does it make sense? > > .pushsection .trampoline_rel, "a" > > .long trampoline_gdt + BOOT_PSEUDORM_CS + 2 - . > > .long trampoline_gdt + BOOT_PSEUDORM_DS + 2 - . > > .popsection > > > > +GLOBAL(xen_img_load_base_addr) > > + .long 0 > > I've yet to understand the purpose of this symbol, but in any case: > If the trampoline code doesn't reference it, why does it get placed > here? This symbol was used by code which unconditionally relocated Xen image even if boot loader did work for us. I removed most of this code because we agreed that if boot loader relocated Xen image then we do not have to relocate it higher once again. If you are still OK with this idea I can go further, as I planned, and remove all such remnants from next release. However, it looks that then I have to store load address in xen_phys_start from 32-bit assembly code. So, it will be nice if I can define it as "unsigned int" instead of "unsigned long". Is it safe? I am asking because this variable is used in quite important places and I do not want to break something by mistake. At first sight it looks that it is safe but it will be nice if you double check it. > > @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1); > > #endif > > > > + /* Do not relocate Xen image if boot loader did work for us. */ > > + if ( xen_img_load_base_addr ) > > + xen_phys_start = xen_img_load_base_addr; > > Okay, with this change the question really is: Why do you need the > new symbol? I.e. why can't you just use xen_phys_start, just like I > managed to re-use it when I introduced boot from EFI? This is what I was going to do in next release... If... Please look above... > > for ( i = boot_e820.nr_map-1; i >= 0; i-- ) > > { > > uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1; > > uint64_t end, limit = ARRAY_SIZE(l2_identmap) << > > L2_PAGETABLE_SHIFT; > > > > - /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */ > > I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but > please don't delete the comment as a whole. OK. > > s = (boot_e820.map[i].addr + mask) & ~mask; > > e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask; > > - s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE); > > if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) ) > > continue; > > This means you now map memory between 2Mb and 16Mb here. Is > this necessary? Without this change Xen relocated by boot loader crashes with: (XEN) Panic on CPU 0: (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902 So, it must be mapped. However, maybe we should not map this region when Xen is not relocated by boot loader. > > @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > l4_pgentry_t *pl4e; > > l3_pgentry_t *pl3e; > > l2_pgentry_t *pl2e; > > - uint64_t load_start; > > int i, j, k; > > > > /* Select relocation address. */ > > @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > * with a barrier(). After this we must *not* modify > > static/global > > * data until after we have switched to the relocated > > pagetables! > > */ > > - load_start = (unsigned long)_start - XEN_VIRT_START; > > barrier(); > > - move_memory(e + load_start, load_start, _end - _start, 1); > > + move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, > > 1); > > Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this > change necessary? Or is it rather just simplification, which again > should be split from an already complex patch. Just simplification. I will split it from this patch. > > @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > * Undo the temporary-hooking of the l1_identmap. > > __2M_text_start > > * is contained in this PTE. > > */ > > - BUG_ON(l2_table_offset((unsigned long)_erodata) == > > - l2_table_offset((unsigned long)_stext)); > > At least when using_2M_mapping() this surely ought to hold? OK. > > @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > : "memory" ); > > > > bootstrap_map(NULL); > > + > > + printk("New Xen image base address: 0x%08lx\n", > > xen_phys_start); > > I don't see the motivation for adding such a message in this patch, > but if, then please use %#lx. OK, I will use %#lx. > > @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > > > if ( !xen_phys_start ) > > panic("Not enough memory to relocate Xen."); > > + > > reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end)); > > Another stray change. I will fix it. > > --- a/xen/arch/x86/xen.lds.S > > +++ b/xen/arch/x86/xen.lds.S > > @@ -55,7 +55,7 @@ SECTIONS > > __2M_text_start = .; /* Start of 2M superpages, mapped RX. */ > > #endif > > > > - . = __XEN_VIRT_START + MB(1); > > + . = __XEN_VIRT_START + XEN_IMG_OFFSET; > > _start = .; > > .text : { > > _stext = .; /* Text and read-only data */ > > @@ -257,12 +257,14 @@ SECTIONS > > .reloc : { > > *(.reloc) > > } :text > > - /* Trick the linker into setting the image size to exactly 16Mb. */ > > . = ALIGN(__section_alignment__); > > +#endif > > + > > + /* Trick the linker into setting the image size to exactly 16Mb. */ > > .pad : { > > . = ALIGN(MB(16)); > > + __end_of_image__ = .; > > I see that you use this symbol in xen/arch/x86/Makefile, but I again > don't follow why this logic needs to change. Current logic does not work because it gets wrong address from xen/xen-syms. This way boot loader allocates incorrect, usually huge, buffer for Xen image (wait, there is a chance that this is a fix for issues related to 2 MiB mappings found by Andrew). I do not see simple reliable fix for current solution. So, I introduced __end_of_image__ and look for its address. This is much better method to establish end of image address then previous one. However, I can agree that this could be introduced in separate patch. > > --- a/xen/include/asm-x86/page.h > > +++ b/xen/include/asm-x86/page.h > > @@ -288,7 +288,7 @@ extern root_pgentry_t > > idle_pg_table[ROOT_PAGETABLE_ENTRIES]; > > extern l2_pgentry_t *compat_idle_pg_table_l2; > > extern unsigned int m2p_compat_vstart; > > extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES], > > - l2_bootmap[L2_PAGETABLE_ENTRIES]; > > + l2_bootmap[4*L2_PAGETABLE_ENTRIES]; > > Why do you need 4 of them? I can see why one might not suffice, > but two surely should? In worst case we need three. One for l1_identmap hook and two for Xen image mapping. So, I am not sure that it pays to complicate assembly mapping code just to save just 1 page. Additionally, you should remember that l2_bootmap is freed after init. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.