[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 22/23] x86: make Xen early boot code relocatable



On Tue, Aug 11, 2015 at 12:48:06PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 20, 2015 at 04:29:17PM +0200, Daniel Kiper wrote:
> > Every multiboot protocol (regardless of version) compatible image must
> > specify its load address (in ELF or multiboot header). Multiboot protocol
> > compatible loader have to load image at specified address. However, there
> > is no guarantee that the requested memory region (in case of Xen it starts
> > at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
> > and it is free (legacy BIOS platforms are merciful for Xen but I found at
> > least one EFI platform on which Xen load address conflicts with EFI boot
> > services; it is Dell PowerEdge R820 with latest firmware). To cope with
> > that problem we must make Xen early boot code relocatable. This patch does
> > that. However, it does not add multiboot2 protocol interface which is done
> > in next patch.
>
> s/next patch/"x86: add multiboot2 protocol support for relocatable image."
> >
> > This patch changes following things:
> >   - default load address is changed from 1 MiB to 2 MiB; I did that because
> >     initial page tables are using 2 MiB huge pages and this way required
> >     updates for them are quite easy; it means that e.g. we avoid spacial
> >     cases for beginning and end of required memory region if it live at
> >     address not aligned to 2 MiB,
> >   - %ebp register is used as a storage for Xen image base address; this way
> >     we can get this value very quickly if it is needed; however, %ebp 
> > register
> >     is not used directly to access a given memory region,
> >   - %fs register is filled with segment descriptor which describes memory 
> > region
> >     with Xen image (it could be relocated or not); it is used to access 
> > some of
>
> 'memory region with Xen image' ? Not sure I follow?
>
> Perhaps:
> segment descriptor which starts (0) at Xen image base (_start).
>
>
> >     Xen data in early boot code; potentially we can use above mentioned 
> > segment
> >     descriptor to access data using %ds:%esi and/or %es:%esi (e.g. movs*); 
> > however,
> >     I think that it could unnecessarily obfuscate code (e.g. we need at 
> > least
> >     to operations to reload a given segment descriptor) and current solution
>
> s/to/two/ ?
> >     looks quite optimal.
> >
> > Signed-off-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx>

[...]

> > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> > index 3f1054d..d484f68 100644
> > --- a/xen/arch/x86/boot/head.S
> > +++ b/xen/arch/x86/boot/head.S

[...]

> >  trampoline_bios_setup:
> > +        mov     %ebp,%esi
> > +
> > +        /* Initialise GDT and basic data segments. */
> > +        add     %ebp,sym_offset(gdt_boot_descr_addr)(%esi)
> > +        lgdt    sym_offset(gdt_boot_descr)(%esi)
> > +
> > +        mov     $BOOT_DS,%ecx
> > +        mov     %ecx,%ds
> > +        mov     %ecx,%es
> > +        mov     %ecx,%fs
> > +        mov     %ecx,%gs
> > +        mov     %ecx,%ss
> > +
>
>
> The non-EFI boot path is now:
>
> start
>  \- __start
>      \- multiboot2_proto
>      |    jmp trampoline_bios_setup
>      |
>      \-and if not MB2: jmp trampoline_bios_setup.
>
>
> In here you tweak the GDT and reload the %ds - but during
> this call chain we do touch the %ds - via:
>
> __start+27>:        testb  $0x1,(%rbx)
> __start+30>:        cmovne 0x4(%rbx),%edx
>
> which is OK (as MB1 says that the %ds has to cover up to 4GB).
> But I wonder why the __start code had the segments reloaded so early?
> Was the bootloader not setting the proper segments?

This is very good question. I was asking myself about that thing at
least once. Sadly, I cannot find real explanation.

> Let me double-check what SYSLINUX's mboot.c32 does. Perhaps
> it had done something odd in the past.

Good idea!

> >          /* Set up trampoline segment 64k below EBDA */
> >          movzwl  0x40e,%ecx          /* EBDA segment */
> >          cmp     $0xa000,%ecx        /* sanity check (high) */
> > @@ -340,33 +357,58 @@ trampoline_bios_setup:
> >          cmovb   %edx,%ecx           /* and use the smaller */
> >
> >  trampoline_setup:
>
> Would it make sense to add:
>
> /* Gets called from EFI (from x86_32_switch) and legacy (see above) boot 
> loaders. */
>
> > +        mov     %ebp,%esi
> > +
> > +        /* Initialize 0-15 bits of BOOT_FS segment descriptor base 
> > address. */
> > +        mov     %ebp,%edx
> > +        shl     $16,%edx
> > +        or      %edx,(sym_offset(trampoline_gdt)+BOOT_FS)(%esi)
> > +
> > +        /* Initialize 16-23 bits of BOOT_FS segment descriptor base 
> > address. */
> > +        mov     %ebp,%edx
> > +        shr     $16,%edx
> > +        and     $0x000000ff,%edx
> > +        or      %edx,(sym_offset(trampoline_gdt)+BOOT_FS+4)(%esi)
> > +
> > +        /* Initialize 24-31 bits of BOOT_FS segment descriptor base 
> > address. */
> > +        mov     %ebp,%edx
> > +        and     $0xff000000,%edx
> > +        or      %edx,(sym_offset(trampoline_gdt)+BOOT_FS+4)(%esi)
> > +
> > +        /* Initialize %fs and later use it to access Xen data if possible. 
> > */
> > +        mov     $BOOT_FS,%edx
> > +        mov     %edx,%fs
> > +
>
> We just modified the GDT. Should we reload it (lgdt?)?

I do not think it is needed.

Intel 64 and IA-32 Architectures Software Developerâs Manual,
Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z says:

LGDT ... Loads the values in the source operand into the global
descriptor table register (GDTR)...

...and ...

MOV ... If the destination operand is a segment register (DS, ES,
FS, GS, or SS), the source operand must be a valid segment selector.
In protected mode, moving a segment selector into a segment register
automatically causes the segment descriptor information associated
with that segment selector to be loaded into the hidden (shadow) part
of the segment register. While loading this information, the segment
selector and segment descriptor information is validated (see the
"Operation" algorithm below). The segment descriptor data is obtained
from the GDT or LDT entry for the specified segment selector.

Additionally, Intel 64 and IA-32 Architectures Software Developerâs
Manual Volume 3 (3A, 3B & 3C): System Programming Guide, section 3.4.3,
Segment Registers says:

Every segment register has a "visible" part and a "hidden" part. (The
hidden part is sometimes referred to as a "descriptor cache" or a
"shadow register.") When a segment selector is loaded into the visible
part of a segment register, the processor also loads the hidden part
of the segment register with the base address, segment limit, and access
control information from the segment descriptor pointed to by the segment
selector. The information cached in the segment register (visible and
hidden) allows the processor to translate addresses without taking
extra bus cycles to read the base address and limit from the segment
descriptor. In systems in which multiple processors have access to the
same descriptor tables, it is the responsibility of software to reload
the segment registers when the descriptor tables are modified (side note:
GDTR reload is not required! probably the same applies to UP systems
and if CPU update own active GDT). If this is not done, an old segment
descriptor cached in a segment register might be used after its
memory-resident version has been modified.

AIUI, only GDT address and limit are loaded into GDTR when lgdt is executed.
Segment descriptor is loaded directly from memory into segment register
(hiden part) only when relevant load instruction is used (e.g. mov %edx,%fs).
Though GDT content probably could be stored in CPU cache but nothing suggest
that CPU caching interfere in one way or another with segment descriptor load.
So, it looks that every change in active GDT is immediately available for
relevant segment descriptor load instruction.

I was not able to find anything which invalidates above. So, everything
suggest that we do not need an extra lgdt.

> >          /* Reserve 64kb for the trampoline. */
> >          sub     $0x1000,%ecx

[...]

> >          /* Hook 4kB mappings of first 2MB of memory into L2. */
> > -        mov     $sym_phys(l1_identmap)+__PAGE_HYPERVISOR,%edi
> > -        mov     %edi,sym_phys(l2_xenmap)
> > -        mov     %edi,sym_phys(l2_bootmap)
> > +        lea     (sym_offset(l1_identmap)+__PAGE_HYPERVISOR)(%ebp),%edi
> > +        mov     %edi,%fs:sym_offset(l2_bootmap)
>
> But not to l2_xenmap?

No, l?_xenmap maps Xen code and data only.

> >          /* Apply relocations to bootstrap trampoline. */
> > -        mov     sym_phys(trampoline_phys),%edx
> > -        mov     $sym_phys(__trampoline_rel_start),%edi
> > +        mov     %fs:sym_offset(trampoline_phys),%edx
> > +        lea     sym_offset(__trampoline_rel_start)(%ebp),%edi
> > +        lea     sym_offset(__trampoline_rel_stop)(%ebp),%esi

[...]

> > diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
> > index 87b3341..27481ac 100644
> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -283,7 +283,7 @@ extern root_pgentry_t 
> > idle_pg_table[ROOT_PAGETABLE_ENTRIES];
> >  extern l2_pgentry_t  *compat_idle_pg_table_l2;
> >  extern unsigned int   m2p_compat_vstart;
> >  extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
> > -    l2_bootmap[L2_PAGETABLE_ENTRIES];
> > +    l2_bootmap[4*L2_PAGETABLE_ENTRIES];
>
> ? Why do we need to expand this to be 16kB?

TBH, we need 8 KiB in the worst case. The worst case is when
next GiB starts (e.g. 1 GiB ends and 2 GiB starts) in the middle
of Xen image. In this situation we must hook up lower l2_bootmap
table with lower l3_bootmap entry, higher l2_bootmap table with
higher l3_bootmap entry and finally fill l2_bootmap relevant
tables in proper way. Sadly, this method requires more calculations.
To avoid that I have added 3 l2_bootmap tables and simply hook up
one after another with relevant l3_bootmap entries. However, if
you wish we can reduce number of l2_bootmap tables to two. This
way code will be more complicated but we will save about 8 KiB.

Daniel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.