[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 6/6] x86/boot: Do not use trampoline for no-real-mode boot paths



On Mon, 2019-08-12 at 12:55 +0200, Jan Beulich wrote:
> On 09.08.2019 17:02, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> > 
> > Where booted from EFI or with no-real-mode, there is no need to stomp
> > on low memory with the 16-boot code. Instead, just go straight to
> > trampoline_protmode_entry() at its physical location within the Xen
> > image.
> > 
> > For now, the boot code (including the EFI loader path) still determines
> > what the trampoline_phys address should be. The trampoline is actually
> > relocated for that address and copied into low memory, from a
> > relocate_trampoline() call made from __start_xen().
> 
> I assume this talks about the real mode part of the trampoline, as
> opposed to the next paragraph? Would be nice if you made this
> explicit.

This is the permanent real-mode trampoline used for AP startup and
wakeup, not the real-mode boot code (which the boot code has to have
put there for itself it it wanted it).

I will try to make the commit message clearer; thanks for pointing it
out.

> > For subsequent AP startup and wakeup, the 32-bit trampoline can't
> > trivially be used in-place as that region isn't mapped. So copy it
> > down to low memory too, having relocated it (again) to work from
> > there.
> 
> trampoline_protmode_entry gets entered with CR0.PG=0, i.e. at
> that point there's not even the question yet of there being a
> mapping. Subsequently idle_pg_table gets loaded into CR3. I wonder
> if, rather than relocating the 32-bit part of the trampoline, it
> wouldn't be better to install a 1:1 mapping into idle_pg_table.
> Such a mapping would need to have the G bits clear in order to
> not conflict with PV guest mappings of the same linear addresses.

Yeah, I tried making that happen. It made me sad. This seemed to be
simpler and less fragile.

> > --- a/xen/arch/x86/acpi/power.c
> > +++ b/xen/arch/x86/acpi/power.c
> > @@ -152,9 +152,9 @@ static void acpi_sleep_prepare(u32 state)
> >           return;
> >   
> >       if ( acpi_sinfo.vector_width == 32 )
> > -        *(uint32_t *)wakeup_vector_va = bootsym_phys(wakeup_start);
> > +        *(uint32_t *)wakeup_vector_va = trampsym_phys(wakeup_start);
> >       else
> > -        *(uint64_t *)wakeup_vector_va = bootsym_phys(wakeup_start);
> > +        *(uint64_t *)wakeup_vector_va = trampsym_phys(wakeup_start);
> >   }
> >   
> >   static void acpi_sleep_post(u32 state) {}
> > @@ -388,7 +388,7 @@ static void tboot_sleep(u8 sleep_state)
> >       g_tboot_shared->acpi_sinfo.wakeup_vector = acpi_sinfo.wakeup_vector;
> >       g_tboot_shared->acpi_sinfo.vector_width = acpi_sinfo.vector_width;
> >       g_tboot_shared->acpi_sinfo.kernel_s3_resume_vector =
> > -                                              bootsym_phys(wakeup_start);
> > +                                              trampsym_phys(wakeup_start);
> 
> Shouldn't changes like these have happened earlier, when you
> introduce the (logical only at that point) distinction between
> trampoline pieces?

That was in assembler code. This is C, which never had to be that
involved with the distinction. But now that all the dust has settled,
I'm making it consistent, using 'trampsym' for stuff in the permanent
trampoline just like the asm code does.


> > @@ -97,7 +100,7 @@ GLOBAL(trampoline_realmode_entry)
> >          cld
> >          cli
> >          lidt    trampsym(idt_48)
> > -        lgdt    trampsym(gdt_48)
> > +        lgdtl   trampsym(gdt_48)
> 
> Stray / unrelated change (and if needed, then also for lidt)?

The difference between 16bit l.dt and 32-bit l.dtl is that the former
only loads 24 bits of the actual table address (trampoline_gdt in this
case).

Thus, when trampoline_gdt is being used in-place, as it is during early
boot, and *if* the Xen image is loaded higher than 16MiB, lgdt doesn't
work. That's half a day of my life I want back.

It doesn't matter for lidt because we're just loading an empty limit
and pointer there, and we don't care about bits 24-31 of a zero value.

> > @@ -236,11 +239,23 @@ gdt_48: .word   7*8-1
> >   
> >   /* The first page of trampoline is permanent, the rest boot-time only. */
> >   /* Reuse the boot trampoline on the 1st trampoline page as stack for 
> > wakeup. */
> > -        .equ    wakeup_stack, boot_trampoline_start + PAGE_SIZE
> > +        .equ    wakeup_stack, perm_trampoline_start + PAGE_SIZE
> >           .global wakeup_stack
> >   
> > +ENTRY(perm_trampoline_end)
> > +
> >   /* From here on early boot only. */
> >   
> > +ENTRY(boot_trampoline_start)
> > +
> > +        .word   0
> > +boot16_idt:
> > +        .word   0, 0, 0 # base = limit = 0
> > +        .word   0
> > +boot16_gdt:
> > +        .word   7*8-1
> > +        .long   tramp32sym_rel(trampoline_gdt,4)
> 
> Can we really not get away without a second copy of these?

Probably, but my judgement was that the complexity and the pain of
doing so would exceed the benefit. I'll take another look at doing so.

> > @@ -304,8 +319,8 @@ trampoline_boot_cpu_entry:
> >           cli
> >   
> >           /* Reset GDT and IDT. Some BIOSes clobber GDTR. */
> > -        lidt    bootsym(idt_48)
> > -        lgdt    bootsym(gdt_48)
> > +        lidt    bootsym(boot16_idt)
> > +        lgdtl   bootsym(boot16_gdt)
> 
> As above - either both should gain a suffix, or neither of them.
> 
> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -682,6 +682,42 @@ static unsigned int __init copy_bios_e820(struct 
> > e820entry *map, unsigned int li
> >       return n;
> >   }
> >   
> > +extern const s32 __trampoline_rel_start[], __trampoline_rel_stop[];
> > +extern const s32 __trampoline32_rel_start[], __trampoline32_rel_stop[];
> > +
> > +static void __init relocate_trampoline(unsigned long phys)
> > +{
> > +    const s32 *trampoline_ptr;
> > +    uint32_t tramp32_delta = 0;
> > +
> > +    /* Apply relocations to trampoline. */
> > +    for ( trampoline_ptr = __trampoline_rel_start;
> > +          trampoline_ptr < __trampoline_rel_stop;
> > +          ++trampoline_ptr )
> > +        *(u32 *)(*trampoline_ptr + (long)trampoline_ptr) += phys;
> > +
> > +    tramp32_delta = phys;
> 
> Any reason this can't be the initializer of the variable, or the
> zero initializer above can't be dropped?

I can't think of one. I think I quite like the initial setting of
tramp32_delta=phys to live *right* above the subsequent if(something)
tramp32_delta-=something, to make it very clear what that calculation
is.

So maybe I'll just drop the pointless =0 initialiser.

> > +    if (!efi_enabled(EFI_LOADER)) {
> 
> Style (missing blanks inside the parentheses, and brace to go on
> its own line).

Ack. You can take the Linux hacker out of the Linux kernel but...

> > --- a/xen/include/asm-x86/config.h
> > +++ b/xen/include/asm-x86/config.h
> > @@ -89,12 +89,12 @@
> >  
> >  #ifndef __ASSEMBLY__
> >  extern unsigned long trampoline_phys;
> > -#define bootsym_phys(sym)                                 \
> > -    (((unsigned long)&(sym)-(unsigned 
> > long)&boot_trampoline_start)+trampoline_phys)
> > -#define bootsym(sym)                                      \
> > +#define trampsym_phys(sym)                                 \
> > +    (((unsigned long)&(sym)-(unsigned 
> > long)&perm_trampoline_start)+trampoline_phys)
> > +#define trampsym(sym)                                      \
> >      (*RELOC_HIDE((typeof(&(sym)))__va(__pa(&(sym))),      \
> > -                 trampoline_phys-__pa(boot_trampoline_start)))
> > -extern char boot_trampoline_start[], boot_trampoline_end[];
> > +                 trampoline_phys-__pa(perm_trampoline_start)))
> 
> As you're touching these, could you please also insert the missing
> blanks around the binary + and - ?

Will do. Thanks.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.