WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH]: kexec: framework and i386 Take V

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH]: kexec: framework and i386 Take V
From: Horms <horms@xxxxxxxxxxxx>
Date: Wed, 26 Apr 2006 15:08:36 +0900
Cc: Isaku Yamahata <yamahata@xxxxxxxxxxxxx>, Magnus Damm <magnus@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Mark Williamson <mark.williamson@xxxxxxxxxxxx>, Akio Takebe <takebe_akio@xxxxxxxxxxxxxx>
Delivery-date: Wed, 26 Apr 2006 02:51:48 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <2c2d3141b9494ea48eda52a13a7c09e4@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20060407074234.GA19846@xxxxxxxxxxxx> <20060417060617.GA20303@xxxxxxxxxxxx> <58C6650A4C9627takebe_akio@xxxxxxxxxxxxxx> <200604231545.23238.mark.williamson@xxxxxxxxxxxx> <5DC6673BE47612takebe_akio@xxxxxxxxxxxxxx> <20060424015307.GA26453%yamahata@xxxxxxxxxxxxx> <78430b99202d9be921745fea24a8041c@xxxxxxxxxxxx> <20060425001110.GX1083@xxxxxxxxxxxx> <2c2d3141b9494ea48eda52a13a7c09e4@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.11+cvs20060403
On Tue, Apr 25, 2006 at 10:57:11AM +0100, Keir Fraser wrote:
> 
> On 25 Apr 2006, at 01:11, Horms wrote:
> 
> >There is a small problem, in that for x86_32 at least the hypercall
> >table is currently full with 32 entries (well, the last time I checked
> >anyway), and my attempts to extend it were futile. Could you give me
> >some advice on how to increase its size?
> 
> Double NR_hypercalls in include/asm-x86/config.h.


Thanks, the new version of the kexec patch below does just that.
I did try that before, but for some reason it didn't work,
most likely because it was Friday afternoon at the time.

Also, this patch takes hypercall 32, which conflicts with Yamahata-san's
ia64 hypercall. Should I move to 33 and submit a patch that just does
hypercall reservation as he did?

-- 
Horms                                           http://www.vergenet.net/~horms/

kexec: framework and i386

This is an implementation of kexec for dom0/xen, that allows
kexecing of the physical machine from xen. The approach taken is
to move the architecture-dependant kexec code into a new hypercall.

Some notes:
  * machine_kexec_cleanup() and machine_kexec_prepare() don't do
    anything in i386. So while this patch adds a framework for them,
    I am not sure what parameters are needs at this stage.
  * Only works for UP, as machine_shutdown is not implemented yet
  * kexecing into xen does not seem to work, I think that 
    kexec-tools needs updating, but I have not investigated yet
  * Kdump works by first copying the kernel into dom0 segments
    and relocating them later in xen, the same way that kexec does
    The only difference is that the relocation is made into
    an area reserved by xen
  * Kdump reservation is made using the xen command line parameters,
    kdump_megabytes and kdump_megabytes_base, rather than
    the linux option crashkernel, which is now ignored.
    Two parameters are used instead of one to simplify parsing.
    This can be cleaned up later if desired. But the reservation
    seems to need to be made by xen to make sure that it happens
    early enough.
  * This patch uses a new kexec hypercall

Highlights since the previous posted version:
 
  * Use new kexec hypercall instead of dom0_op
    - hypercall table is expanded to 64 entries

Prepared by Horms and Magnus Damm

Signed-Off-By: Magnus Damm <magnus@xxxxxxxxxxxxx>
Signed-Off-By: Horms <horms@xxxxxxxxxxxx>

 linux-2.6-xen-sparse/arch/i386/Kconfig                         |    2 
 linux-2.6-xen-sparse/arch/i386/kernel/Makefile                 |    2 
 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c              |   24 +
 linux-2.6-xen-sparse/drivers/xen/core/Makefile                 |    1 
 linux-2.6-xen-sparse/drivers/xen/core/crash.c                  |   98 +++++
 linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c          |   73 ++++
 linux-2.6-xen-sparse/drivers/xen/core/reboot.c                 |    7 
 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h |   10 
 ref-linux-2.6.16/drivers/base/cpu.c                            |    4 
 ref-linux-2.6.16/kernel/kexec.c                                |   52 ++
 xen/arch/x86/Makefile                                          |    1 
 xen/arch/x86/dom0_ops.c                                        |    3 
 xen/arch/x86/machine_kexec.c                                   |  174 
++++++++++
 xen/arch/x86/setup.c                                           |   75 +++-
 xen/arch/x86/x86_32/entry.S                                    |    2 
 xen/common/Makefile                                            |    1 
 xen/common/kexec.c                                             |   71 ++++
 xen/common/page_alloc.c                                        |   33 +
 xen/include/asm-x86/config.h                                   |    2 
 xen/include/asm-x86/hypercall.h                                |    5 
 xen/include/public/kexec.h                                     |   43 ++
 xen/include/public/xen.h                                       |    9 
 xen/include/xen/mm.h                                           |    1 
 23 files changed, 659 insertions(+), 34 deletions(-)

--- x/linux-2.6-xen-sparse/arch/i386/Kconfig
+++ x/linux-2.6-xen-sparse/arch/i386/Kconfig
@@ -726,7 +726,7 @@ source kernel/Kconfig.hz
 
 config KEXEC
        bool "kexec system call (EXPERIMENTAL)"
-       depends on EXPERIMENTAL && !X86_XEN
+       depends on EXPERIMENTAL
        help
          kexec is a system call that implements the ability to shutdown your
          current kernel, and to start another kernel.  It is like a reboot
--- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile
+++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile
@@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen
 
 obj-y += fixup.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o
-n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o
+n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o 
crash.o
 
 obj-y := $(call filterxen, $(obj-y), $(n-obj-xen))
 obj-y := $(call cherrypickxen, $(obj-y))
--- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c
+++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c
@@ -68,6 +68,10 @@
 #include "setup_arch_pre.h"
 #include <bios_ebda.h>
 
+#ifdef CONFIG_XEN
+#include <xen/interface/kexec.h>
+#endif
+
 /* Forward Declaration. */
 void __init find_max_pfn(void);
 
@@ -932,6 +936,7 @@ static void __init parse_cmdline_early (
                 * after a kernel panic.
                 */
                else if (!memcmp(from, "crashkernel=", 12)) {
+#ifndef CONFIG_XEN
                        unsigned long size, base;
                        size = memparse(from+12, &from);
                        if (*from == '@') {
@@ -942,6 +947,10 @@ static void __init parse_cmdline_early (
                                crashk_res.start = base;
                                crashk_res.end   = base + size - 1;
                        }
+#else
+                       printk("Ignoring crashkernel command line, "
+                              "parameter will be supplied by xen\n");
+#endif
                }
 #endif
 #ifdef CONFIG_PROC_VMCORE
@@ -1318,9 +1327,21 @@ void __init setup_bootmem_allocator(void
        }
 #endif
 #ifdef CONFIG_KEXEC
+#ifndef CONFIG_XEN
        if (crashk_res.start != crashk_res.end)
                reserve_bootmem(crashk_res.start,
                        crashk_res.end - crashk_res.start + 1);
+#else
+       {
+               struct kexec_arg xen_kexec_arg;
+               BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_reserve, &xen_kexec_arg));
+               if (xen_kexec_arg.u.reserve.size) {
+                       crashk_res.start = xen_kexec_arg.u.reserve.start;
+                       crashk_res.end = xen_kexec_arg.u.reserve.start + 
+                               xen_kexec_arg.u.reserve.size - 1;
+               }
+       }
+#endif
 #endif
 
        if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1395,6 +1416,9 @@ legacy_init_iomem_resources(struct resou
                res->end = map[i].end - 1;
                res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
                request_resource(&iomem_resource, res);
+#ifdef CONFIG_KEXEC
+        request_resource(res, &crashk_res);
+#endif
        }
 
        free_bootmem(__pa(map), PAGE_SIZE);
--- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile
+++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_NET)     += skbuff.o
 obj-$(CONFIG_SMP)     += smpboot.o
 obj-$(CONFIG_SYSFS)   += hypervisor_sysfs.o
 obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o
+obj-$(CONFIG_KEXEC)   += machine_kexec.o crash.o
--- /dev/null
+++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c
@@ -0,0 +1,98 @@
+/*
+ * Architecture specific (i386-xen) functions for kexec based crash dumps.
+ *
+ * Created by: Horms <horms@xxxxxxxxxxxx>
+ *
+ */
+
+#include <linux/kernel.h> /* For printk */
+
+/* XXX: final_note(), crash_save_this_cpu() and crash_save_self()
+ * are copied from arch/i386/kernel/crash.c, might be good to either
+ * the original functions non-static and use them, or just
+ * merge this this into that file. 
+ */
+#include <linux/elf.h>     /* For struct elf_note */
+#include <linux/elfcore.h> /* For struct elf_prstatus */
+#include <linux/kexec.h>   /* crash_notes */
+
+static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
+                                                              size_t data_len)
+{
+       struct elf_note note;
+
+       note.n_namesz = strlen(name) + 1;
+       note.n_descsz = data_len;
+       note.n_type   = type;
+       memcpy(buf, &note, sizeof(note));
+       buf += (sizeof(note) +3)/4;
+       memcpy(buf, name, note.n_namesz);
+       buf += (note.n_namesz + 3)/4;
+       memcpy(buf, data, note.n_descsz);
+       buf += (note.n_descsz + 3)/4;
+
+       return buf;
+}
+
+static void final_note(u32 *buf)
+{
+       struct elf_note note;
+
+       note.n_namesz = 0;
+       note.n_descsz = 0;
+       note.n_type   = 0;
+       memcpy(buf, &note, sizeof(note));
+}
+
+static void crash_save_this_cpu(struct pt_regs *regs, int cpu)
+{
+       struct elf_prstatus prstatus;
+       u32 *buf;
+
+       if ((cpu < 0) || (cpu >= NR_CPUS))
+               return;
+
+       /* Using ELF notes here is opportunistic.
+        * I need a well defined structure format
+        * for the data I pass, and I need tags
+        * on the data to indicate what information I have
+        * squirrelled away.  ELF notes happen to provide
+        * all of that that no need to invent something new.
+        */
+       buf = (u32*)per_cpu_ptr(crash_notes, cpu);
+       if (!buf)
+               return;
+       memset(&prstatus, 0, sizeof(prstatus));
+       prstatus.pr_pid = current->pid;
+       elf_core_copy_regs(&prstatus.pr_reg, regs);
+       buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus,
+                               sizeof(prstatus));
+       final_note(buf);
+}
+
+static void crash_save_self(struct pt_regs *regs)
+{
+       int cpu;
+
+       cpu = smp_processor_id();
+       crash_save_this_cpu(regs, cpu);
+}
+
+
+void machine_crash_shutdown(struct pt_regs *regs)
+{
+       /* XXX: This should do something */
+       printk("xen-kexec: Need to turn of other CPUS in "
+              "machine_crash_shutdown()\n");
+       crash_save_self(regs);
+}
+
+/*
+ * Local variables:
+ *  c-file-style: "linux"
+ *  indent-tabs-mode: t
+ *  c-indent-level: 8
+ *  c-basic-offset: 8
+ *  tab-width: 8
+ * End:
+ */
--- /dev/null
+++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c
@@ -0,0 +1,73 @@
+/*
+ * machine_kexec.c - handle transition of Linux booting another kernel
+ *
+ * Created By: Horms <horms@xxxxxxxxxxxx>
+ *
+ * Losely based on arch/i386/kernel/machine_kexec.c
+ */
+
+#include <linux/kexec.h>
+#include <xen/interface/kexec.h>
+#include <linux/mm.h>
+#include <asm/hypercall.h>
+
+const extern unsigned char relocate_new_kernel[];
+extern unsigned int relocate_new_kernel_size;
+
+/*
+ * A architecture hook called to validate the
+ * proposed image and prepare the control pages
+ * as needed.  The pages for KEXEC_CONTROL_CODE_SIZE
+ * have been allocated, but the segments have yet
+ * been copied into the kernel.
+ *
+ * Do what every setup is needed on image and the
+ * reboot code buffer to allow us to avoid allocations
+ * later.
+ *
+ * Currently nothing.
+ */
+int machine_kexec_prepare(struct kimage *image)
+{
+       kexec_arg_t hypercall_arg;
+               hypercall_arg.u.helper.data = NULL;
+       return HYPERVISOR_kexec(KEXEC_CMD_kexec_prepare, &hypercall_arg);
+}
+
+/*
+ * Undo anything leftover by machine_kexec_prepare
+ * when an image is freed.
+ */
+void machine_kexec_cleanup(struct kimage *image)
+{
+       kexec_arg_t hypercall_arg;
+       hypercall_arg.u.helper.data = NULL;
+       HYPERVISOR_kexec(KEXEC_CMD_kexec_cleanup, &hypercall_arg);
+}
+
+/*
+ * Do not allocate memory (or fail in any way) in machine_kexec().
+ * We are past the point of no return, committed to rebooting now.
+ */
+NORET_TYPE void machine_kexec(struct kimage *image)
+{
+       kexec_arg_t hypercall_arg;
+       hypercall_arg.u.kexec.indirection_page = image->head;
+       hypercall_arg.u.kexec.reboot_code_buffer = 
+               pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT;
+       hypercall_arg.u.kexec.start_address = image->start;
+       hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel;
+       hypercall_arg.u.kexec.relocate_new_kernel_size = 
+               relocate_new_kernel_size;
+       HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg);
+}
+
+/*
+ * Local variables:
+ *  c-file-style: "linux"
+ *  indent-tabs-mode: t
+ *  c-indent-level: 8
+ *  c-basic-offset: 8
+ *  tab-width: 8
+ * End:
+ */
--- x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c
+++ x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c
@@ -370,6 +370,13 @@ static int __init setup_shutdown_event(v
 
 subsys_initcall(setup_shutdown_event);
 
+#ifdef CONFIG_KEXEC
+void machine_shutdown(void) 
+{
+       printk("machine_shutdown: does nothing\n");
+}
+#endif
+
 /*
  * Local variables:
  *  c-file-style: "linux"
--- x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h
+++ x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h
@@ -37,6 +37,8 @@
 # error "please don't include this file directly"
 #endif
 
+#include <xen/interface/kexec.h>
+
 #define __STR(x) #x
 #define STR(x) __STR(x)
 
@@ -343,6 +345,14 @@ HYPERVISOR_xenoprof_op(
        return _hypercall2(int, xenoprof_op, op, arg);
 }
 
+static inline int
+HYPERVISOR_kexec(
+       unsigned long op, kexec_arg_t * arg)
+{
+       return _hypercall2(int, kexec_op, op, arg); 
+}
+
+
 
 #endif /* __HYPERCALL_H__ */
 
--- x/ref-linux-2.6.16/drivers/base/cpu.c
+++ x/ref-linux-2.6.16/drivers/base/cpu.c
@@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s
         * boot up and this data does not change there after. Hence this
         * operation should be safe. No locking required.
         */
+#ifndef CONFIG_XEN
        addr = __pa(per_cpu_ptr(crash_notes, cpunum));
+#else
+       addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum));
+#endif
        rc = sprintf(buf, "%Lx\n", addr);
        return rc;
 }
--- x/ref-linux-2.6.16/kernel/kexec.c
+++ x/ref-linux-2.6.16/kernel/kexec.c
@@ -38,6 +38,20 @@ struct resource crashk_res = {
        .flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
 
+/* Kexec needs to know about the actually physical addresss.
+ * But in xen, a physical address is a pseudo-physical addresss. */
+#ifndef CONFIG_XEN
+#define kexec_page_to_pfn(page)  page_to_pfn(page)
+#define kexec_pfn_to_page(pfn)   pfn_to_page(pfn)
+#define kexec_virt_to_phys(addr) virt_to_phys(addr)
+#define kexec_phys_to_virt(addr) phys_to_virt(addr)
+#else
+#define kexec_page_to_pfn(page)  pfn_to_mfn(page_to_pfn(page))
+#define kexec_pfn_to_page(pfn)   pfn_to_page(mfn_to_pfn(pfn))
+#define kexec_virt_to_phys(addr) virt_to_machine(addr)
+#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr))
+#endif
+
 int kexec_should_crash(struct task_struct *p)
 {
        if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops)
@@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_
                pages = kimage_alloc_pages(GFP_KERNEL, order);
                if (!pages)
                        break;
-               pfn   = page_to_pfn(pages);
+               pfn   = kexec_page_to_pfn(pages);
                epfn  = pfn + count;
                addr  = pfn << PAGE_SHIFT;
                eaddr = epfn << PAGE_SHIFT;
@@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_
        return pages;
 }
 
+#ifndef CONFIG_XEN
 static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
                                                      unsigned int order)
 {
@@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c
                }
                /* If I don't overlap any segments I have found my hole! */
                if (i == image->nr_segments) {
-                       pages = pfn_to_page(hole_start >> PAGE_SHIFT);
+                       pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT);
                        break;
                }
        }
@@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages(
 
        return pages;
 }
+#else /* !CONFIG_XEN */
+struct page *kimage_alloc_control_pages(struct kimage *image,
+                                        unsigned int order)
+{
+       return kimage_alloc_normal_control_pages(image, order);
+}
+#endif
 
 static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
 {
@@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag
                        return -ENOMEM;
 
                ind_page = page_address(page);
-               *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION;
+               *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION;
                image->entry = ind_page;
                image->last_entry = ind_page +
                                      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
@@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag
 #define for_each_kimage_entry(image, ptr, entry) \
        for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
                ptr = (entry & IND_INDIRECTION)? \
-                       phys_to_virt((entry & PAGE_MASK)): ptr +1)
+                       kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1)
 
 static void kimage_free_entry(kimage_entry_t entry)
 {
        struct page *page;
 
-       page = pfn_to_page(entry >> PAGE_SHIFT);
+       page = kexec_pfn_to_page(entry >> PAGE_SHIFT);
        kimage_free_pages(page);
 }
 
@@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st
         * have a match.
         */
        list_for_each_entry(page, &image->dest_pages, lru) {
-               addr = page_to_pfn(page) << PAGE_SHIFT;
+               addr = kexec_page_to_pfn(page) << PAGE_SHIFT;
                if (addr == destination) {
                        list_del(&page->lru);
                        return page;
@@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st
                if (!page)
                        return NULL;
                /* If the page cannot be used file it away */
-               if (page_to_pfn(page) >
+               if (kexec_page_to_pfn(page) >
                                (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
                        list_add(&page->lru, &image->unuseable_pages);
                        continue;
                }
-               addr = page_to_pfn(page) << PAGE_SHIFT;
+               addr = kexec_page_to_pfn(page) << PAGE_SHIFT;
 
                /* If it is the destination page we want use it */
                if (addr == destination)
@@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st
                        struct page *old_page;
 
                        old_addr = *old & PAGE_MASK;
-                       old_page = pfn_to_page(old_addr >> PAGE_SHIFT);
+                       old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT);
                        copy_highpage(page, old_page);
                        *old = addr | (*old & ~PAGE_MASK);
 
@@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st
                        result  = -ENOMEM;
                        goto out;
                }
-               result = kimage_add_page(image, page_to_pfn(page)
+               result = kimage_add_page(image, kexec_page_to_pfn(page)
                                                                << PAGE_SHIFT);
                if (result < 0)
                        goto out;
@@ -811,6 +833,7 @@ out:
        return result;
 }
 
+#ifndef CONFIG_XEN
 static int kimage_load_crash_segment(struct kimage *image,
                                        struct kexec_segment *segment)
 {
@@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str
                char *ptr;
                size_t uchunk, mchunk;
 
-               page = pfn_to_page(maddr >> PAGE_SHIFT);
+               page = kexec_pfn_to_page(maddr >> PAGE_SHIFT);
                if (page == 0) {
                        result  = -ENOMEM;
                        goto out;
@@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki
 
        return result;
 }
+#else /* CONFIG_XEN */
+static int kimage_load_segment(struct kimage *image,
+                               struct kexec_segment *segment)
+{
+       return kimage_load_normal_segment(image, segment);
+}
+#endif
 
 /*
  * Exec Kernel system call: for obvious reasons only root may call it.
--- x/xen/arch/x86/Makefile
+++ x/xen/arch/x86/Makefile
@@ -38,6 +38,7 @@ obj-y += trampoline.o
 obj-y += traps.o
 obj-y += usercopy.o
 obj-y += x86_emulate.o
+obj-y += machine_kexec.o
 
 ifneq ($(pae),n)
 obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o
--- x/xen/arch/x86/dom0_ops.c
+++ x/xen/arch/x86/dom0_ops.c
@@ -29,6 +29,9 @@
 #include <asm/mtrr.h>
 #include "cpu/mtrr/mtrr.h"
 
+extern unsigned int opt_kdump_megabytes;
+extern unsigned int opt_kdump_megabytes_base;
+
 #define TRC_DOM0OP_ENTER_BASE  0x00020000
 #define TRC_DOM0OP_LEAVE_BASE  0x00030000
 
--- /dev/null
+++ x/xen/arch/x86/machine_kexec.c
@@ -0,0 +1,174 @@
+/******************************************************************************
+ * arch/x86/machine_kexec.c
+ * 
+ * Created By: Horms
+ *
+ * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/domain_page.h> 
+#include <xen/timer.h>
+#include <xen/sched.h>
+#include <asm/page.h> 
+#include <asm/flushtlb.h>
+#include <public/xen.h>
+#include <public/kexec.h>
+
+#ifdef CONFIG_X86_32
+
+typedef asmlinkage void (*relocate_new_kernel_t)(
+                    unsigned long indirection_page,
+                    unsigned long reboot_code_buffer,
+                    unsigned long start_address,
+                    unsigned int has_pae);
+
+#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
+
+#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define L2_ATTR (_PAGE_PRESENT)
+
+#ifndef CONFIG_X86_PAE
+
+static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED;
+
+static void identity_map_page(unsigned long address)
+{
+    unsigned long mfn;
+    u32 *pgtable_level2;
+
+    /* Find the current page table */
+    mfn = read_cr3() >> PAGE_SHIFT;
+    pgtable_level2 = map_domain_page(mfn);
+
+    /* Identity map the page table entry */
+    pgtable_level1[l1_table_offset(address)] = address | L0_ATTR;
+    pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR;
+
+    /* Flush the tlb so the new mapping takes effect.
+     * Global tlb entries are not flushed but that is not an issue.
+     */
+    write_cr3(mfn << PAGE_SHIFT);
+
+    unmap_domain_page(pgtable_level2);
+}
+
+#else
+static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED;
+static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED;
+
+static void identity_map_page(unsigned long address)
+{
+    int mfn;
+    intpte_t *pgtable_level3;
+
+    /* Find the current page table */
+    mfn = read_cr3() >> PAGE_SHIFT;
+    pgtable_level3 = map_domain_page(mfn);
+
+    /* Identity map the page table entry */
+    pgtable_level1[l1_table_offset(address)] = address | L0_ATTR;
+    pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR;
+    set_64bit(&pgtable_level3[l3_table_offset(address)],
+             __pa(pgtable_level2) | L2_ATTR);
+
+    /* Flush the tlb so the new mapping takes effect.
+     * Global tlb entries are not flushed but that is not an issue.
+     */
+    load_cr3(mfn << PAGE_SHIFT);
+
+    unmap_domain_page(pgtable_level3);
+}
+#endif
+
+static void kexec_load_segments(void)
+{
+#define __SSTR(X) #X
+#define SSTR(X) __SSTR(X)
+    __asm__ __volatile__ (
+        "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n"
+        "\t1:\n"
+        "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n"
+        "\tmovl %%eax,%%ds\n"
+        "\tmovl %%eax,%%es\n"
+        "\tmovl %%eax,%%fs\n"
+        "\tmovl %%eax,%%gs\n"
+        "\tmovl %%eax,%%ss\n"
+        ::: "eax", "memory");
+#undef SSTR
+#undef __SSTR
+}
+
+#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr))
+static void kexec_set_idt(void *newidt, __u16 limit)
+{
+    struct Xgt_desc_struct curidt;
+
+    /* ia32 supports unaliged loads & stores */
+    curidt.size    = limit;
+    curidt.address = (unsigned long)newidt;
+    
+    kexec_load_idt(&curidt);
+
+};
+
+#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr))
+static void kexec_set_gdt(void *newgdt, __u16 limit)
+{
+    struct Xgt_desc_struct curgdt;
+
+    /* ia32 supports unaligned loads & stores */
+    curgdt.size    = limit;
+    curgdt.address = (unsigned long)newgdt;
+
+    kexec_load_gdt(&curgdt);
+};
+
+#endif
+
+int machine_kexec_prepare(struct kexec_arg *arg)
+{
+       return 0;
+}
+
+void machine_kexec_cleanup(struct kexec_arg *arg)
+{
+}
+
+void machine_kexec(struct kexec_arg *arg)
+{
+#ifdef CONFIG_X86_32
+    relocate_new_kernel_t rnk;
+
+    local_irq_disable();
+
+    identity_map_page(arg->u.kexec.reboot_code_buffer);
+
+    copy_from_user((void *)arg->u.kexec.reboot_code_buffer, 
+           arg->u.kexec.relocate_new_kernel,
+           arg->u.kexec.relocate_new_kernel_size);
+
+    kexec_load_segments();
+
+    kexec_set_gdt(__va(0),0);
+
+    kexec_set_idt(__va(0),0);
+
+    rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer;
+
+    (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, 
+           arg->u.kexec.start_address, cpu_has_pae);
+#endif
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- x/xen/arch/x86/setup.c
+++ x/xen/arch/x86/setup.c
@@ -38,6 +38,11 @@ static unsigned int opt_xenheap_megabyte
 integer_param("xenheap_megabytes", opt_xenheap_megabytes);
 #endif
 
+unsigned int opt_kdump_megabytes = 0;
+integer_param("kdump_megabytes", opt_kdump_megabytes);
+unsigned int opt_kdump_megabytes_base = 0;
+integer_param("kdump_megabytes_base", opt_kdump_megabytes_base);
+
 /* opt_nosmp: If true, secondary processors are ignored. */
 static int opt_nosmp = 0;
 boolean_param("nosmp", opt_nosmp);
@@ -192,6 +197,20 @@ static void percpu_free_unused_areas(voi
                        __pa(__per_cpu_end));
 }
 
+void __init move_memory(unsigned long dst, 
+                          unsigned long src_start, unsigned long src_end)
+{
+#if defined(CONFIG_X86_32)
+    memmove((void *)dst,  /* use low mapping */
+            (void *)src_start,      /* use low mapping */
+            src_end - src_start);
+#elif defined(CONFIG_X86_64)
+    memmove(__va(dst),
+            __va(src_start),
+            src_end - src_start);
+#endif
+}
+
 void __init __start_xen(multiboot_info_t *mbi)
 {
     char __cmdline[] = "", *cmdline = __cmdline;
@@ -327,15 +346,8 @@ void __init __start_xen(multiboot_info_t
         initial_images_start = xenheap_phys_end;
     initial_images_end = initial_images_start + modules_length;
 
-#if defined(CONFIG_X86_32)
-    memmove((void *)initial_images_start,  /* use low mapping */
-            (void *)mod[0].mod_start,      /* use low mapping */
-            mod[mbi->mods_count-1].mod_end - mod[0].mod_start);
-#elif defined(CONFIG_X86_64)
-    memmove(__va(initial_images_start),
-            __va(mod[0].mod_start),
-            mod[mbi->mods_count-1].mod_end - mod[0].mod_start);
-#endif
+    move_memory(initial_images_start, 
+                mod[0].mod_start, mod[mbi->mods_count-1].mod_end);
 
     /* Initialise boot-time allocator with all RAM situated after modules. */
     xenheap_phys_start = init_boot_allocator(__pa(&_end));
@@ -383,6 +395,51 @@ void __init __start_xen(multiboot_info_t
 #endif
     }
 
+    if (opt_kdump_megabytes) {
+        unsigned long kdump_start, kdump_size, k;
+
+        /* mark images pages as free for now */
+
+        init_boot_pages(initial_images_start, initial_images_end);
+
+        kdump_start = opt_kdump_megabytes_base << 20;
+        kdump_size = opt_kdump_megabytes << 20;
+
+        printk("Kdump: %luMB (%lukB) at 0x%lx\n", 
+               kdump_size >> 20,
+               kdump_size >> 10,
+               kdump_start);
+
+        if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK))
+            panic("Kdump parameters not page aligned\n");
+
+        kdump_start >>= PAGE_SHIFT;
+        kdump_size >>= PAGE_SHIFT;
+
+        /* allocate pages for Kdump memory area */
+
+        k = alloc_boot_pages_at(kdump_size, kdump_start);
+
+        if (k != kdump_start)
+            panic("Unable to reserve Kdump memory\n");
+
+        /* allocate pages for relocated initial images */
+
+        k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0;
+        k += (initial_images_end - initial_images_start) >> PAGE_SHIFT;
+
+        k = alloc_boot_pages(k, 1);
+
+        if (!k)
+            panic("Unable to allocate initial images memory\n");
+
+        move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end);
+
+        initial_images_end -= initial_images_start;
+        initial_images_start = k << PAGE_SHIFT;
+        initial_images_end += initial_images_start;
+    }        
+
     memguard_init();
 
     printk("System RAM: %luMB (%lukB)\n", 
--- x/xen/arch/x86/x86_32/entry.S
+++ x/xen/arch/x86/x86_32/entry.S
@@ -646,6 +646,7 @@ ENTRY(hypercall_table)
         .long do_arch_sched_op
         .long do_callback_op        /* 30 */
         .long do_xenoprof_op
+        .long do_kexec
         .rept NR_hypercalls-((.-hypercall_table)/4)
         .long do_ni_hypercall
         .endr
@@ -683,6 +684,7 @@ ENTRY(hypercall_args_table)
         .byte 2 /* do_arch_sched_op     */
         .byte 2 /* do_callback_op       */  /* 30 */
         .byte 2 /* do_xenoprof_op       */
+        .byte 2 /* do_kexec       */
         .rept NR_hypercalls-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
--- x/xen/common/Makefile
+++ x/xen/common/Makefile
@@ -7,6 +7,7 @@ obj-y += event_channel.o
 obj-y += grant_table.o
 obj-y += kernel.o
 obj-y += keyhandler.o
+obj-y += kexec.o
 obj-y += lib.o
 obj-y += memory.o
 obj-y += multicall.o
--- /dev/null
+++ x/xen/common/kexec.c
@@ -0,0 +1,71 @@
+/*
+ * Achitecture independent kexec code for Xen
+ *
+ * At this statge, just a switch for the kexec hypercall into
+ * architecture dependent code.
+ *
+ * Created By: Horms <horms@xxxxxxxxxxxx>
+ */
+
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/guest_access.h>
+#include <xen/sched.h>
+#include <public/xen.h>
+#include <public/kexec.h>
+
+extern int machine_kexec_prepare(struct kexec_arg *arg);
+extern void machine_kexec_cleanup(struct kexec_arg *arg);
+extern void machine_kexec(struct kexec_arg *arg);
+
+extern unsigned int opt_kdump_megabytes;
+extern unsigned int opt_kdump_megabytes_base;
+
+int do_kexec(unsigned long op, 
+             GUEST_HANDLE(kexec_arg_t) uarg)
+{
+    struct kexec_arg arg;
+
+    if ( !IS_PRIV(current->domain) )  
+        return -EPERM;
+
+    if ( op != KEXEC_CMD_reserve &&
+        unlikely(copy_from_guest(&arg, uarg, 1) != 0) )
+    {
+        printk("do_kexec: __copy_from_guest failed");
+        return -EFAULT;
+    }
+
+    switch(op) {
+    case KEXEC_CMD_kexec:
+        machine_kexec(&arg);
+        return -EINVAL; /* Not Reached */
+    case KEXEC_CMD_kexec_prepare:
+        return machine_kexec_prepare(&arg);
+    case KEXEC_CMD_kexec_cleanup:
+        machine_kexec_cleanup(&arg);
+        return 0;
+    case KEXEC_CMD_reserve:
+       arg.u.reserve.size = opt_kdump_megabytes << 20;
+       arg.u.reserve.start = opt_kdump_megabytes_base << 20;
+       if ( unlikely(copy_to_guest(uarg, &arg, 1) != 0) )
+       {
+               printk("do_kexec: copy_to_guest failed");
+               return -EFAULT;
+       }
+       return 0;
+    }
+
+    return -EINVAL;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
--- x/xen/common/page_alloc.c
+++ x/xen/common/page_alloc.c
@@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t
     }
 }
 
+unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at)
+{
+    unsigned long i;
+
+    for ( i = 0; i < nr_pfns; i++ )
+        if ( allocated_in_map(pfn_at + i) )
+             break;
+
+    if ( i == nr_pfns )
+    {
+        map_alloc(pfn_at, nr_pfns);
+        return pfn_at;
+    }
+
+    return 0;
+}
+
 unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align)
 {
-    unsigned long pg, i;
+    unsigned long pg, i = 0;
 
     for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align )
     {
-        for ( i = 0; i < nr_pfns; i++ )
-            if ( allocated_in_map(pg + i) )
-                 break;
-
-        if ( i == nr_pfns )
-        {
-            map_alloc(pg, nr_pfns);
-            return pg;
-        }
+        i = alloc_boot_pages_at(nr_pfns, pg);
+        if (i != 0)
+            break;
     }
 
-    return 0;
+    return i;
 }
 
 
--- x/xen/include/asm-x86/config.h
+++ x/xen/include/asm-x86/config.h
@@ -66,7 +66,7 @@
 #define barrier() __asm__ __volatile__("": : :"memory")
 
 /* A power-of-two value greater than or equal to number of hypercalls. */
-#define NR_hypercalls 32
+#define NR_hypercalls 64
 
 #if NR_hypercalls & (NR_hypercalls - 1)
 #error "NR_hypercalls must be a power-of-two value"
--- x/xen/include/asm-x86/hypercall.h
+++ x/xen/include/asm-x86/hypercall.h
@@ -6,6 +6,7 @@
 #define __ASM_X86_HYPERCALL_H__
 
 #include <public/physdev.h>
+#include <public/kexec.h>
 
 extern long
 do_set_trap_table(
@@ -79,6 +80,10 @@ extern long
 arch_do_vcpu_op(
     int cmd, struct vcpu *v, GUEST_HANDLE(void) arg);
 
+extern int
+do_kexec(
+    unsigned long op, GUEST_HANDLE(kexec_arg_t) uarg);
+
 #ifdef __x86_64__
 
 extern long
--- /dev/null
+++ x/xen/include/public/kexec.h
@@ -0,0 +1,43 @@
+/*
+ * kexec.h: Xen kexec
+ *
+ * Created By: Horms <horms@xxxxxxxxxxxx>
+ */
+
+#ifndef _XEN_PUBLIC_KEXEC_H
+#define _XEN_PUBLIC_KEXEC_H
+
+/*
+ * Scratch space for passing arguments to the kexec hypercall
+ */
+typedef struct kexec_arg {
+    union {
+        struct {
+            unsigned long data; /* Not sure what this should be yet */
+        } helper;
+        struct {
+            unsigned long indirection_page;
+            unsigned long reboot_code_buffer;
+            unsigned long start_address;
+            const char *relocate_new_kernel;
+            unsigned int relocate_new_kernel_size;
+        } kexec;
+        struct {
+            unsigned long size;
+            unsigned long start;
+        } reserve;
+    } u;
+} kexec_arg_t;
+DEFINE_GUEST_HANDLE(kexec_arg_t);
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- x/xen/include/public/xen.h
+++ x/xen/include/public/xen.h
@@ -62,6 +62,7 @@
 #define __HYPERVISOR_sched_op             29
 #define __HYPERVISOR_callback_op          30
 #define __HYPERVISOR_xenoprof_op          31
+#define __HYPERVISOR_kexec_op             32
 
 /* 
  * VIRTUAL INTERRUPTS
@@ -215,6 +216,14 @@ DEFINE_GUEST_HANDLE(mmuext_op_t);
 #define VMASST_TYPE_writable_pagetables  2
 #define MAX_VMASST_TYPE 2
 
+/*
+ * Operations for kexec.
+ */
+#define KEXEC_CMD_kexec                 0
+#define KEXEC_CMD_kexec_prepare         1
+#define KEXEC_CMD_kexec_cleanup         2
+#define KEXEC_CMD_reserve               3
+
 #ifndef __ASSEMBLY__
 
 typedef uint16_t domid_t;
--- x/xen/include/xen/mm.h
+++ x/xen/include/xen/mm.h
@@ -40,6 +40,7 @@ struct page_info;
 paddr_t init_boot_allocator(paddr_t bitmap_start);
 void init_boot_pages(paddr_t ps, paddr_t pe);
 unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align);
+unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at);
 void end_boot_allocator(void);
 
 /* Generic allocator. These functions are *not* interrupt-safe. */


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel