[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 15/16] x86: make Xen early boot code relocatable



On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
> >>> On 20.08.16 at 00:43, <daniel.kiper@xxxxxxxxxx> wrote:
> > Every multiboot protocol (regardless of version) compatible image must
> > specify its load address (in ELF or multiboot header). Multiboot protocol
> > compatible loader have to load image at specified address. However, there
> > is no guarantee that the requested memory region (in case of Xen it starts
> > at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
> > and it is free (legacy BIOS platforms are merciful for Xen but I found at
> > least one EFI platform on which Xen load address conflicts with EFI boot
> > services; it is Dell PowerEdge R820 with latest firmware). To cope with that
> > problem we must make Xen early boot code relocatable and help boot loader to
> > relocate image in proper way by suggesting, not requesting specific load
> > addresses as it is right now, allowed address ranges. This patch does 
> > former.
> > It does not add multiboot2 protocol interface which is done in "x86: add
> > multiboot2 protocol support for relocatable images" patch.
> >
> > This patch changes following things:
> >   - default load address is changed from 1 MiB to 2 MiB; I did that because
> >     initial page tables are using 2 MiB huge pages and this way required
> >     updates for them are quite easy; it means that e.g. we avoid spacial
> >     cases for start and end of required memory region if it live at address
> >     not aligned to 2 MiB,
>
> Should this be a separate change then, to separate possible
> regressions due to that from such due to any other of the changes

Potentially yes. However, it should be done properly. Otherwise in
case of revert we can break Xen relocatable infrastructure and other
things. So, I am not sure does it pays. Anyway, I will check is it
possible or not.

> here? And then I don't see why, when making the image relocatable
> anyway, the link time load address actually still matters.

It matters for multiboot (v1) and multiboot2 compatible boot
loaders not supporting relocatable images.

> >   - %esi and %r15d registers are used as a storage for Xen image load base
> >     address (%r15d shortly because %rsi is used for EFI SystemTable address
> >     in 64-bit code); both registers are (%esi is mostly) unused in early
> >     boot code and preserved during C functions calls,
>
> In a context where you (also) talk about 64-bit code, %esi can't
> be called preserved unilaterally. You should make clear that this is
> for 32-bit function calls.

OK.

> > --- a/xen/arch/x86/boot/head.S
> > +++ b/xen/arch/x86/boot/head.S
> > @@ -12,13 +12,16 @@
> >          .text
> >          .code32
> >
> > -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
> > +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
>
> In a patch already large and hence hard to review, did you really
> need to do this rename?

I think that sym_offset() is better term here. However, if you wish
I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests
that we are playing with physical addresses and it is not correct in
all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish).

> > @@ -126,26 +130,26 @@ vga_text_buffer:
> >          .section .init.text, "ax", @progbits
> >
> >  bad_cpu:
> > -        mov     $(sym_phys(.Lbad_cpu_msg)),%esi # Error message
> > +        lea     esi_offset(.Lbad_cpu_msg),%esi  # Error message
>
> Why not just
>
>         add     $sym_offset(.Lbad_cpu_msg),%esi  # Error message

OK.

[...]

> > @@ -461,62 +545,88 @@ trampoline_setup:
> >
> >          /* Stash TSC to calculate a good approximation of time-since-boot 
> > */
> >          rdtsc
> > -        mov     %eax,sym_phys(boot_tsc_stamp)
> > -        mov     %edx,sym_phys(boot_tsc_stamp+4)
> > +        mov     %eax,fs_offset(boot_tsc_stamp)
> > +        mov     %edx,fs_offset(boot_tsc_stamp)+4
> > +
> > +        /* Update frame addresses in page tables. */
> > +        mov     $((__page_tables_end-__page_tables_start)/8),%ecx
> > +1:      testl   $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8)
> > +        jz      2f
> > +        add     %esi,fs_offset(__page_tables_start)-8(,%ecx,8)
> > +2:      loop    1b
>
> This loop includes the mapping of the low Mb, which I don't think it
> should modify. Or wait - you move __page_tables_start, which by
> itself is fragile (but looks to be acceptable if accompanied by a
> comment explaining why it doesn't cover l1_identmap).

OK, I will add relevant comment somewhere.

> Also, just to double check - all these page table setup adjustments
> don't require reflecting in xen.efi's page table setup code?

I will check it once again but I do not think so.

[...]

> > --- a/xen/arch/x86/boot/trampoline.S
> > +++ b/xen/arch/x86/boot/trampoline.S
> > @@ -54,12 +54,20 @@ trampoline_gdt:
> >          /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
> >          .long   0x0000ffff
> >          .long   0x00009200
> > +        /*
> > +         * 0x0030: ring 0 Xen data, 16 MiB size, base
> > +         * address is computed during runtime.
> > +         */
> > +        .quad   0x00c0920000001000
>
> This does not look like it covers 1Mb, it's more like 1Mb+4k-1.

I have checked it once again. It covers 16 MiB as required:
  4 KiB * 0x1000 = 0x1000000 (no taking into account relocation).

By the way, it looks that we can define this segment as 0x200000 - 0x1000000.
However, then sym_fs() should be defined as:

#define sym_fs(sym)    %fs:(sym_offset(sym) - XEN_IMG_OFFSET)

Does it make sense?

> >          .pushsection .trampoline_rel, "a"
> >          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
> >          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
> >          .popsection
> >
> > +GLOBAL(xen_img_load_base_addr)
> > +        .long   0
>
> I've yet to understand the purpose of this symbol, but in any case:
> If the trampoline code doesn't reference it, why does it get placed
> here?

This symbol was used by code which unconditionally relocated Xen image
even if boot loader did work for us. I removed most of this code because
we agreed that if boot loader relocated Xen image then we do not have to
relocate it higher once again. If you are still OK with this idea I can go
further, as I planned, and remove all such remnants from next release.
However, it looks that then I have to store load address in xen_phys_start
from 32-bit assembly code. So, it will be nice if I can define it as
"unsigned int" instead of "unsigned long". Is it safe? I am asking because
this variable is used in quite important places and I do not want to break
something by mistake. At first sight it looks that it is safe but it will
be nice if you double check it.

> > @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >          highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
> >  #endif
> >
> > +    /* Do not relocate Xen image if boot loader did work for us. */
> > +    if ( xen_img_load_base_addr )
> > +        xen_phys_start = xen_img_load_base_addr;
>
> Okay, with this change the question really is: Why do you need the
> new symbol? I.e. why can't you just use xen_phys_start, just like I
> managed to re-use it when I introduced boot from EFI?

This is what I was going to do in next release... If... Please look above...

> >      for ( i = boot_e820.nr_map-1; i >= 0; i-- )
> >      {
> >          uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
> >          uint64_t end, limit = ARRAY_SIZE(l2_identmap) << 
> > L2_PAGETABLE_SHIFT;
> >
> > -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
>
> I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but
> please don't delete the comment as a whole.

OK.

> >          s = (boot_e820.map[i].addr + mask) & ~mask;
> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
> >              continue;
>
> This means you now map memory between 2Mb and 16Mb here. Is
> this necessary?

Without this change Xen relocated by boot loader crashes with:

(XEN) Panic on CPU 0:
(XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902

So, it must be mapped. However, maybe we should not map this
region when Xen is not relocated by boot loader.

> > @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >              l4_pgentry_t *pl4e;
> >              l3_pgentry_t *pl3e;
> >              l2_pgentry_t *pl2e;
> > -            uint64_t load_start;
> >              int i, j, k;
> >
> >              /* Select relocation address. */
> > @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >               * with a barrier(). After this we must *not* modify 
> > static/global
> >               * data until after we have switched to the relocated 
> > pagetables!
> >               */
> > -            load_start = (unsigned long)_start - XEN_VIRT_START;
> >              barrier();
> > -            move_memory(e + load_start, load_start, _end - _start, 1);
> > +            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 
> > 1);
>
> Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this
> change necessary? Or is it rather just simplification, which again
> should be split from an already complex patch.

Just simplification. I will split it from this patch.

> > @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >               * Undo the temporary-hooking of the l1_identmap.  
> > __2M_text_start
> >               * is contained in this PTE.
> >               */
> > -            BUG_ON(l2_table_offset((unsigned long)_erodata) ==
> > -                   l2_table_offset((unsigned long)_stext));
>
> At least when using_2M_mapping() this surely ought to hold?

OK.

> > @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >                  : "memory" );
> >
> >              bootstrap_map(NULL);
> > +
> > +            printk("New Xen image base address: 0x%08lx\n", 
> > xen_phys_start);
>
> I don't see the motivation for adding such a message in this patch,
> but if, then please use %#lx.

OK, I will use %#lx.

> > @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >
> >      if ( !xen_phys_start )
> >          panic("Not enough memory to relocate Xen.");
> > +
> >      reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));
>
> Another stray change.

I will fix it.

> > --- a/xen/arch/x86/xen.lds.S
> > +++ b/xen/arch/x86/xen.lds.S
> > @@ -55,7 +55,7 @@ SECTIONS
> >    __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
> >  #endif
> >
> > -  . = __XEN_VIRT_START + MB(1);
> > +  . = __XEN_VIRT_START + XEN_IMG_OFFSET;
> >    _start = .;
> >    .text : {
> >          _stext = .;            /* Text and read-only data */
> > @@ -257,12 +257,14 @@ SECTIONS
> >    .reloc : {
> >      *(.reloc)
> >    } :text
> > -  /* Trick the linker into setting the image size to exactly 16Mb. */
> >    . = ALIGN(__section_alignment__);
> > +#endif
> > +
> > +  /* Trick the linker into setting the image size to exactly 16Mb. */
> >    .pad : {
> >      . = ALIGN(MB(16));
> > +    __end_of_image__ = .;
>
> I see that you use this symbol in xen/arch/x86/Makefile, but I again
> don't follow why this logic needs to change.

Current logic does not work because it gets wrong address from xen/xen-syms.
This way boot loader allocates incorrect, usually huge, buffer for Xen image
(wait, there is a chance that this is a fix for issues related to 2 MiB mappings
found by Andrew). I do not see simple reliable fix for current solution. So,
I introduced __end_of_image__ and look for its address. This is much better
method to establish end of image address then previous one. However, I can
agree that this could be introduced in separate patch.

> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -288,7 +288,7 @@ extern root_pgentry_t
> > idle_pg_table[ROOT_PAGETABLE_ENTRIES];
> >  extern l2_pgentry_t  *compat_idle_pg_table_l2;
> >  extern unsigned int   m2p_compat_vstart;
> >  extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
> > -    l2_bootmap[L2_PAGETABLE_ENTRIES];
> > +    l2_bootmap[4*L2_PAGETABLE_ENTRIES];
>
> Why do you need 4 of them? I can see why one might not suffice,
> but two surely should?

In worst case we need three. One for l1_identmap hook and two
for Xen image mapping. So, I am not sure that it pays to complicate
assembly mapping code just to save just 1 page. Additionally, you
should remember that l2_bootmap is freed after init.

Daniel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.