[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] x86/boot: Don't double-copy the stack during physical relocation


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 7 Dec 2021 15:07:30 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1TkZZTyddgKp3F0rKfwrnyaI0bxs60mKMKcbUUuIyW8=; b=YSuhzGsOvVGPXjRCosF2ZH/7WyoP2u7RtLwMpiUtxl70dYZG3IhXX70V39ohLd0guNFZVpxXq3NgCbA08cXBYpQvwR6Q1lv1Ew8Yk95of8UJcKYtLHU6jx2FYFAwR3YH1uH2OyCT6q1E99Dgm6MA1u7081MX4f0NzU11Li9sXT1meTLoUEXFOXP63umkUASW8GICtMZMqb3N9A9/4/ozdspw7BnPlVrB5mzysABnkcHNCw9XalooGE2Nfkx7Hz1ruzjUEHRbBDA0Pyto5jOIJjP4jMzeZVkrPI6+ssuBbR268bmdQTWR/MjVKi9opC5giBNG+24WVBZc6DmjszRYtQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nRt6cLxjDNNZsz5lOtRyI/iktbZa6r9WIzrIH35R2+qvYu50/Prmqq5zVhqL8H4O9NSoPoZVFlMqQ5ytjZqxyn7KmByH3kUaYlwGVhaWLm3hLmwU0scgEcM0k+mPTqPBSORZ9BZ6ES9ETjOT3+MOQC3+nbbbUTL3wjaaes7L54H4E+SRk2X4s7LY9xAOiBXtfxe4MGsEiNYiP9er9rFPAqvjSJspIS7FH7COlNBf8SkhLGEdHvcEok3I+euenpG5UVgDTxYzQLp1vid5IvpOA6STbRM5Sup+jV+j/HWvzD8ChDTbwfjBhlvLPiu4uQ8gUDvkAOlSIns5BGWhXUscNg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 07 Dec 2021 14:07:56 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.12.2021 11:53, Andrew Cooper wrote:
> cpu0_stack is contained within .data, which means the memcpy() already takes a
> snapshot at the start of the critical region.
> 
> Later, when we switch to the relocated Xen, we do end up losing updates to the
> local variables,

You word this as if that was the case already before your change, ...

> but that's fine because the only variables we've modified go
> out of scope after the printk().  Use this properly to avoid copying the whole
> stack (32k) a second time.

... not the least because of "Use this properly ...". But aiui this
is only a result of your change, in that you no longer "re-sync the
stack" (as the comment says that you remove).

I can appreciate that copying 32k is too much. Yet I do think that
we're putting ourselves at risk if we rely on local variables going
out of scope which have been updated. Even more so with no comment
next to their declarations making clear that it is imperative for
them to not move to outer scopes. (Seeing e.g. another "i" and "j"
in the outer scope, one might actually be inclined to do so. For
"j" this would, afaict, even work, while "i" collides with the outer
one no matter what.)

You mentioning the printk() - did you consider someone, perhaps just
for debugging purposes, adding uses there of some of these local
variables?

We could limit the copying to just the primary stack(s), for
example. Otoh I'm not convinced trying to save 20k or even 32k of
copied memory is worth it in this boot-time code. There are larger
gains to be had elsewhere - see e.g. my "x86: memcpy() / memset()
(non-)ERMS flavors plus fallout" and "IOMMU: superpage support when
not sharing pagetables" which continue to be pending.

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1183,6 +1183,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>              l3_pgentry_t *pl3e;
>              l2_pgentry_t *pl2e;
>              int i, j, k;
> +            unsigned long tmp;
>  
>              /* Select relocation address. */
>              xen_phys_start = end - reloc_size;
> @@ -1193,7 +1194,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>               * Perform relocation to new physical address.
>               * Before doing so we must sync static/global data with main 
> memory
>               * with a barrier(). After this we must *not* modify 
> static/global
> -             * data until after we have switched to the relocated pagetables!
> +             * data, or locals that need to survive, until after we have
> +             * switched to the relocated pagetables!
>               */
>              barrier();
>              memcpy(__va(__pa(_start)), _start, _end - _start);
> @@ -1239,18 +1241,21 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                  *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + 
> xen_phys_start);
>              }
>  
> -            /* Re-sync the stack and then switch to relocated pagetables. */
> +            /*
> +             * Switch to relocated pagetables.  This also discards updates to
> +             * any local variables since the memmove() call above, but that's
> +             * fine because we don't use any of them again.
> +             */
>              asm volatile (
> -                "rep movsq        ; " /* re-sync the stack */
> -                "movq %%cr4,%%rsi ; "
> -                "andb $0x7f,%%sil ; "
> -                "movq %%rsi,%%cr4 ; " /* CR4.PGE == 0 */
> -                "movq %[pg],%%cr3 ; " /* CR3 == new pagetables */
> -                "orb $0x80,%%sil  ; "
> -                "movq %%rsi,%%cr4   " /* CR4.PGE == 1 */
> -                : "=&S" (i), "=&D" (i), "=&c" (i) /* All outputs discarded. 
> */
> -                :  [pg] "r" (__pa(idle_pg_table)), "0" (cpu0_stack),
> -                   "1" (__va(__pa(cpu0_stack))), "2" (STACK_SIZE / 8)
> +                "mov    %%cr4, %[cr4]\n\t"
> +                "andb   $~%c[pge], %b[cr4]\n\t"

Why not simply %[pge] (name subject to improvement) with the value
below becoming ~X86_CR4_PGE (and the constraint becoming "K")?

> +                "mov    %[cr4], %%cr4\n\t"     /* CR4.PGE = 0 */
> +                "mov    %[cr3], %%cr3\n\t"     /* CR3 = new pagetables */
> +                "orb    $%c[pge], %b[cr4]\n\t"

Oh, I see - the constant gets reused here.

Jan

> +                "mov    %[cr4], %%cr4\n\t"     /* CR4.PGE = 1 */
> +                : [cr4] "=&a" (tmp) /* Could be "r", but "a" makes better 
> asm */
> +                : [cr3] "r" (__pa(idle_pg_table)),
> +                  [pge] "i" (X86_CR4_PGE)
>                  : "memory" );
>  
>              printk("New Xen image base address: %#lx\n", xen_phys_start);
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.