Xen project Mailing List

Re: [Xen-devel] [PATCH 4/9] kexec: extend hypercall with improved load/unload ops

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Daniel Kiper <daniel.kiper@xxxxxxxxxx>

Date: Mon, 7 Oct 2013 09:47:11 +0200

Cc: kexec@xxxxxxxxxxxxxxxxxx, Keir Fraser <keir@xxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Mon, 07 Oct 2013 07:47:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Sun, Oct 06, 2013 at 07:16:16PM +0100, Andrew Cooper wrote: > On 04/10/2013 22:23, Daniel Kiper wrote: > > On Fri, Sep 20, 2013 at 02:10:50PM +0100, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@xxxxxxxxxx> > >> > >> In the existing kexec hypercall, the load and unload ops depend on > >> internals of the Linux kernel (the page list and code page provided by > >> the kernel). The code page is used to transition between Xen context > >> and the image so using kernel code doesn't make sense and will not > >> work for PVH guests. > >> > >> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops > >> that no longer require a code page to be provided by the guest -- Xen > >> now provides the code for calling the image directly. > >> > >> The new load op looks similar to the Linux kexec_load system call and > >> allows the guest to provide the image data to be loaded. The guest > >> specifies the architecture of the image which may be a 32-bit subarch > >> of the hypervisor's architecture (i.e., an EM_386 image on an > >> EM_X86_64 hypervisor). > >> > >> The toolstack can now load images without kernel involvement. This is > >> required for supporting kexec when using a dom0 with an upstream > >> kernel. > >> > >> Crash images are copied directly into the crash region on load. > >> Default images are copied into domheap pages and a list of source and > >> destination machine addresses is created. This is list is used in > >> kexec_reloc() to relocate the image to its destination. > >> > >> The old load and unload sub-ops are still available (as > >> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top > >> of the new infrastructure. > >> > >> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > > [...] > > > >> diff --git a/xen/arch/x86/x86_64/kexec_reloc.S > >> b/xen/arch/x86/x86_64/kexec_reloc.S > >> new file mode 100644 > >> index 0000000..41dd27b > >> --- /dev/null > >> +++ b/xen/arch/x86/x86_64/kexec_reloc.S > >> @@ -0,0 +1,208 @@ > >> +/* > >> + * Relocate a kexec_image to its destination and call it. > >> + * > >> + * Copyright (C) 2013 Citrix Systems R&D Ltd. > >> + * > >> + * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S. > >> + * > >> + * Copyright (C) 2002-2005 Eric Biederman <ebiederm@xxxxxxxxxxxx> > >> + * > >> + * This source code is licensed under the GNU General Public License, > >> + * Version 2. See the file COPYING for more details. > >> + */ > >> +#include <xen/config.h> > >> +#include <xen/kimage.h> > >> + > >> +#include <asm/asm_defns.h> > >> +#include <asm/msr.h> > >> +#include <asm/page.h> > >> +#include <asm/machine_kexec.h> > >> + > >> + .text > >> + .align PAGE_SIZE > >> + .code64 > >> + > >> +ENTRY(kexec_reloc) > >> + /* %rdi - code page maddr */ > >> + /* %rsi - page table maddr */ > >> + /* %rdx - indirection page maddr */ > >> + /* %rcx - entry maddr */ > >> + /* %r8 - flags */ > >> + > >> + movq %rdx, %rbx > > Delete movq %rdx, %rbx > > > >> + /* Setup stack. */ > >> + leaq (reloc_stack - kexec_reloc)(%rdi), %rsp > >> + > >> + /* Load reloc page table. */ > >> + movq %rsi, %cr3 > >> + > >> + /* Jump to identity mapped code. */ > >> + leaq (identity_mapped - kexec_reloc)(%rdi), %rax > >> + jmpq *%rax > >> + > >> +identity_mapped: > >> + pushq %rcx > >> + pushq %rbx > >> + pushq %rsi > >> + pushq %rdi > > Delete pushq %rbx, pushq %rsi, pushq %rdi > > > >> + /* > >> + * Set cr0 to a known state: > >> + * - Paging enabled > >> + * - Alignment check disabled > >> + * - Write protect disabled > >> + * - No task switch > >> + * - Don't do FP software emulation. > >> + * - Proctected mode enabled > >> + */ > >> + movq %cr0, %rax > >> + andl $~(X86_CR0_AM | X86_CR0_WP | X86_CR0_TS | X86_CR0_EM), > >> %eax > >> + orl $(X86_CR0_PG | X86_CR0_PE), %eax > >> + movq %rax, %cr0 > >> + > >> + /* > >> + * Set cr4 to a known state: > >> + * - physical address extension enabled > >> + */ > >> + movl $X86_CR4_PAE, %eax > >> + movq %rax, %cr4 > >> + > >> + movq %rbx, %rdi > > movq %rdx, %rdi > > > >> + call relocate_pages > >> + > >> + popq %rdi > >> + popq %rsi > >> + popq %rbx > >> + popq %rcx > > Delete popq %rdi, popq %rsi, popq %rbx > > > >> + /* Need to switch to 32-bit mode? */ > >> + testq $KEXEC_RELOC_FLAG_COMPAT, %r8 > >> + jnz call_32_bit > >> + > >> +call_64_bit: > >> + /* Call the image entry point. This should never return. */ > > I think that all general purpose registers (including %rsi, %rdi, %rbp > > and %rsp) should be zeroed here. We should leave as little as possible > > info about previous system. Especially in kexec case. Just in case. > > Please look into linux/arch/x86/kernel/relocate_kernel_64.S > > for more details. > > In the kexec case, we are specifically trying to leak the state of the > entire system outside of the crash area so something can investigate. You think about kdump which is different thing than kexec. > Zeroing the registers for the sake of it is pointless. Any author of > kexec image expecting all GPRs to be 0 on entry deserves the debugging > problems they are asking for. I agree that anybody should not assume that any given register contains specific value in this case. However, we should not leak data (especially in kexec not kdump case) if we could wipe it at almost no cost. I know that memory contains more data but why we would like to make bad guys life easier? > >> + callq *%rcx > > Maybe we should use retq to jump into image entry point. If not > > I think that we should store image entry point address in %rax > > (just to the order). > > retq with a zeroed %rsp ? I hope that you are joking here. I assume that everybody on this list knows which register should be set in each case. > This code was derived from the linux code which allows a kexec image to > pass parameters (which is also why the registers are pushed/poped above) > and return back to the originator. (Although I have to admit to being > curious to how someone would go about trying to justify using this > feature in a real usecase.) Original Linux code uses ret not call. > It might be reasonable to replace "callq *%rcx; ud2" with "jmp *%rcx", > but all of this does feel like nitpicking on a codepath which > (hopefully) will never be executed. > > If we decide here and now that Xen shall never support passing arguments > to the kexec image, and never support the kexec image returning back to > us then perhaps we can go ahead and change things, although I would > argue that consistency with the other implementation is more important. Right, so should we use ret? I do not insist on that. More important thing for me is zeroing of registers. Later stuff is also done in Linux Kernel. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.