[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps



On Fri, Oct 07, 2011 at 01:09:32PM -0400, Steven Rostedt wrote:
> Note, this is just hacked together and needs to be cleaned up. Please do
> not comment on formatting or other sloppiness of this patch. I know it's
> sloppy and I left debug statements in. I want the comments to be on the
> idea of the patch.
> 
> I created a new file called scripts/update_jump_label.[ch] based on some
> of the work of recordmcount.c. This is executed at build time on all
> object files just like recordmcount is. But it does not add any new
> sections, it just modifies the code at build time to convert all jump
> labels into nops.
> 
> The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but
> instead to insert a jmp to the label. Depending on how gcc optimizes the
> code, the jmp will be either end up being a 2 byte or 5 byte jump.
> 
> After an object is compiled, update_jump_label is executed on this file
> and it reads the ELF relocation table to find the jump label locations
> and examines what jump was used. It then converts the jump into either a
> 2 byte or 5 byte nop that is appropriate.
> 
> At boot time, the jump labels no longer need to be converted (although
> we may do so in the future to use better nops depending on the machine
> that is used). When jump labels are enabled, the code is examined to see
> if a two byte or 5 byte version was used, and the appropriate update is
> made.
> 
> I just booted this patch and it worked. I was able to enable and disable
> trace points using jump labels. Benchmarks are welcomed :)
> 
> Comments and thoughts?
> 

Generally, I really like it, I guess b/c I suggested it :) I'll try and
run some workloads on it - A real simple one, I used recently was putting
a single jump label in 'getppid()' and then calling it in a loop - I
wonder if the short nop vs long nop would show up there, as a baseline
test. (fwiw, the jump label vs. no jump label for this test was anywhere
b/w 1-5% improvement).

Anyways, some comments below.  

> -- Steve
> 
> Sloppy-signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> 
> diff --git a/Makefile b/Makefile
> index 31f967c..8368f42 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo 
> $$BASH; \
>  
>  HOSTCC       = gcc
>  HOSTCXX      = g++
> -HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 
> -fomit-frame-pointer
> +HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -g 
> -fomit-frame-pointer
>  HOSTCXXFLAGS = -O2
>  
>  # Decide whether to build built-in, modular, or both.
> @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
>  endif
>  endif
>  
> +ifdef CONFIG_JUMP_LABEL
> +     ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
> +             BUILD_UPDATE_JUMP_LABEL := y
> +             export BUILD_UPDATE_JUMP_LABEL
> +     endif
> +endif
> +
>  # We trigger additional mismatches with less inlining
>  ifdef CONFIG_DEBUG_SECTION_MISMATCH
>  KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4b0669c..8fa6934 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI
>         subsystem.  Also has support for calculating CPU cycle events
>         to determine how many clock cycles in a given period.
>  
> +config HAVE_BUILD_TIME_JUMP_LABEL
> +       bool
> +       help
> +     If an arch uses scripts/update_jump_label to patch in jump nops
> +     at build time, then it must enable this option.
> +
>  config HAVE_ARCH_JUMP_LABEL
>       bool
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6a47bb2..6de726a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -61,6 +61,7 @@ config X86
>       select HAVE_ARCH_KMEMCHECK
>       select HAVE_USER_RETURN_NOTIFIER
>       select HAVE_ARCH_JUMP_LABEL
> +     select HAVE_BUILD_TIME_JUMP_LABEL
>       select HAVE_TEXT_POKE_SMP
>       select HAVE_GENERIC_HARDIRQS
>       select HAVE_SPARSE_IRQ
> diff --git a/arch/x86/include/asm/jump_label.h 
> b/arch/x86/include/asm/jump_label.h
> index a32b18c..872b3e1 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -14,7 +14,7 @@
>  static __always_inline bool arch_static_branch(struct jump_label_key *key)
>  {
>       asm goto("1:"
> -             JUMP_LABEL_INITIAL_NOP
> +             "jmp %l[l_yes]\n"
>               ".pushsection __jump_table,  \"aw\" \n\t"
>               _ASM_ALIGN "\n\t"
>               _ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 3fee346..1f7f88f 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -16,34 +16,75 @@
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> +static unsigned char nop_short[] = { P6_NOP2 };
> +
>  union jump_code_union {
>       char code[JUMP_LABEL_NOP_SIZE];
>       struct {
>               char jump;
>               int offset;
>       } __attribute__((packed));
> +     struct {
> +             char jump_short;
> +             char offset_short;
> +     } __attribute__((packed));
>  };
>  
>  void arch_jump_label_transform(struct jump_entry *entry,
>                              enum jump_label_type type)
>  {
>       union jump_code_union code;
> +     unsigned char op;
> +     unsigned size;
> +     unsigned char nop;
> +
> +     /* Use probe_kernel_read()? */
> +     op = *(unsigned char *)entry->code;
> +     nop = ideal_nops[NOP_ATOMIC5][0];
>  
>       if (type == JUMP_LABEL_ENABLE) {
> -             code.jump = 0xe9;
> -             code.offset = entry->target -
> -                             (entry->code + JUMP_LABEL_NOP_SIZE);
> -     } else
> -             memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
> +             if (op == 0xe9 || op == 0xeb)
> +                     /* Already enabled. Warn? */
> +                     return;
> +

Using the jump_label_inc/dec interface this shouldn't happen, I would
have it be BUG


> +             /* FIXME for all archs */
> +             if (op == nop_short[0]) {
> +                     size = 2;
> +                     code.jump_short = 0xeb;
> +                     code.offset = entry->target -
> +                             (entry->code + 2);
> +                     /* Check for overflow ? */
> +             } else if (op == nop) {
> +                     size = JUMP_LABEL_NOP_SIZE;
> +                     code.jump = 0xe9;
> +                     code.offset = entry->target - (entry->code + size);
> +             } else
> +                     return; /* WARN ? */

same here, at least WARN, more likely BUG()

> +
> +     } else {
> +             if (op == nop_short[0] || nop)
> +                     /* Already disabled, warn? */
> +                     return;
> +

same here.

> +             if (op == 0xe9) {
> +                     size = JUMP_LABEL_NOP_SIZE;
> +                     memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
> +             } else if (op == 0xeb) {
> +                     size = 2;
> +                     memcpy(&code, nop_short, size);
> +             } else
> +                     return; /* WARN ? */

same here

> +     }
>       get_online_cpus();
>       mutex_lock(&text_mutex);
> -     text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> +     text_poke_smp((void *)entry->code, &code, size);
>       mutex_unlock(&text_mutex);
>       put_online_cpus();
>  }
>  
>  void arch_jump_label_text_poke_early(jump_label_t addr)
>  {
> +     return;
>       text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
>                       JUMP_LABEL_NOP_SIZE);
>  }

hmmm...we spent a bunch of time selecting the 'ideal' run-time noops I
wouldn't want to drop that work.

> diff --git a/scripts/Makefile b/scripts/Makefile
> index df7678f..738b65c 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
>  hostprogs-$(CONFIG_VT)           += conmakehash
>  hostprogs-$(CONFIG_IKCONFIG)     += bin2c
>  hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
> +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
>  
>  always               := $(hostprogs-y) $(hostprogs-m)
>  
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index a0fd502..bc0d89b 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -258,6 +258,15 @@ cmd_modversions =                                        
>                         \
>       fi;
>  endif
>  
> +ifdef BUILD_UPDATE_JUMP_LABEL
> +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
> +                     $(srctree)/scripts/update_jump_label.h
> +cmd_update_jump_label =                                              \
> +     if [ $(@) != "scripts/mod/empty.o" ]; then              \
> +             $(objtree)/scripts/update_jump_label "$(@)";    \
> +     fi;
> +endif
> +
>  ifdef CONFIG_FTRACE_MCOUNT_RECORD
>  ifdef BUILD_C_RECORDMCOUNT
>  ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
> @@ -294,6 +303,7 @@ define rule_cc_o_c
>       $(cmd_modversions)                                                \
>       $(call echo-cmd,record_mcount)                                    \
>       $(cmd_record_mcount)                                              \
> +     $(cmd_update_jump_label)                                          \
>       scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
>                                                     $(dot-target).tmp;  \
>       rm -f $(depfile);                                                 \
> @@ -301,13 +311,14 @@ define rule_cc_o_c
>  endef
>  
>  # Built-in and composite module parts
> -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) 
> FORCE
>       $(call cmd,force_checksrc)
>       $(call if_changed_rule,cc_o_c)
>  
>  # Single-part modules are special since we need to mark them in $(MODVERDIR)
>  
> -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
> +               $(update_jump_label_source) FORCE
>       $(call cmd,force_checksrc)
>       $(call if_changed_rule,cc_o_c)
>       @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
> diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
> new file mode 100644
> index 0000000..86e17bc
> --- /dev/null
> +++ b/scripts/update_jump_label.c
> @@ -0,0 +1,349 @@
> +/*
> + * update_jump_label.c: replace jmps with nops at compile time.
> + * Copyright 2010 Steven Rostedt <srostedt@xxxxxxxxxx>, Red Hat Inc.
> + *  Parsing of the elf file was influenced by recordmcount.c
> + *  originally written by and copyright to John F. Reiser 
> <jreiser@xxxxxxxxxxxx>.
> + */
> +
> +/*
> + * Note, this code is originally designed for x86, but may be used by
> + * other archs to do the nop updates at compile time instead of at boot time.
> + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 
> bytes.
> + * Inserting a 2 byte where possible helps with both CPU performance and
> + * icache strain.
> + */
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <getopt.h>
> +#include <elf.h>
> +#include <fcntl.h>
> +#include <setjmp.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +static int fd_map;   /* File descriptor for file being modified. */
> +static struct stat sb;       /* Remember .st_size, etc. */
> +static int mmap_failed; /* Boolean flag. */
> +
> +static void die(const char *err, const char *fmt, ...)
> +{
> +     va_list ap;
> +
> +     if (err)
> +             perror(err);
> +
> +     if (fmt) {
> +             va_start(ap, fmt);
> +             fprintf(stderr, "Fatal error:  ");
> +             vfprintf(stderr, fmt, ap);
> +             fprintf(stderr, "\n");
> +             va_end(ap);
> +     }
> +
> +     exit(1);
> +}
> +
> +static void usage(char **argv)
> +{
> +     char *arg = argv[0];
> +     char *p = arg+strlen(arg);
> +
> +     while (p >= arg && *p != '/')
> +             p--;
> +     p++;
> +
> +     printf("usage: %s file\n"
> +            "\n",p);
> +     exit(-1);
> +}
> +
> +/* w8rev, w8nat, ...: Handle endianness. */
> +
> +static uint64_t w8rev(uint64_t const x)
> +{
> +     return   ((0xff & (x >> (0 * 8))) << (7 * 8))
> +            | ((0xff & (x >> (1 * 8))) << (6 * 8))
> +            | ((0xff & (x >> (2 * 8))) << (5 * 8))
> +            | ((0xff & (x >> (3 * 8))) << (4 * 8))
> +            | ((0xff & (x >> (4 * 8))) << (3 * 8))
> +            | ((0xff & (x >> (5 * 8))) << (2 * 8))
> +            | ((0xff & (x >> (6 * 8))) << (1 * 8))
> +            | ((0xff & (x >> (7 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w4rev(uint32_t const x)
> +{
> +     return   ((0xff & (x >> (0 * 8))) << (3 * 8))
> +            | ((0xff & (x >> (1 * 8))) << (2 * 8))
> +            | ((0xff & (x >> (2 * 8))) << (1 * 8))
> +            | ((0xff & (x >> (3 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w2rev(uint16_t const x)
> +{
> +     return   ((0xff & (x >> (0 * 8))) << (1 * 8))
> +            | ((0xff & (x >> (1 * 8))) << (0 * 8));
> +}
> +
> +static uint64_t w8nat(uint64_t const x)
> +{
> +     return x;
> +}
> +
> +static uint32_t w4nat(uint32_t const x)
> +{
> +     return x;
> +}
> +
> +static uint32_t w2nat(uint16_t const x)
> +{
> +     return x;
> +}
> +
> +static uint64_t (*w8)(uint64_t);
> +static uint32_t (*w)(uint32_t);
> +static uint32_t (*w2)(uint16_t);
> +
> +/* ulseek, uread, ...:  Check return value for errors. */
> +
> +static off_t
> +ulseek(int const fd, off_t const offset, int const whence)
> +{
> +     off_t const w = lseek(fd, offset, whence);
> +     if (w == (off_t)-1)
> +             die("lseek", NULL);
> +
> +     return w;
> +}
> +
> +static size_t
> +uread(int const fd, void *const buf, size_t const count)
> +{
> +     size_t const n = read(fd, buf, count);
> +     if (n != count)
> +             die("read", NULL);
> +
> +     return n;
> +}
> +
> +static size_t
> +uwrite(int const fd, void const *const buf, size_t const count)
> +{
> +     size_t const n = write(fd, buf, count);
> +     if (n != count)
> +             die("write", NULL);
> +
> +     return n;
> +}
> +
> +static void *
> +umalloc(size_t size)
> +{
> +     void *const addr = malloc(size);
> +     if (addr == 0)
> +             die("malloc", "malloc failed: %zu bytes\n", size);
> +
> +     return addr;
> +}
> +
> +/*
> + * Get the whole file as a programming convenience in order to avoid
> + * malloc+lseek+read+free of many pieces.  If successful, then mmap
> + * avoids copying unused pieces; else just read the whole file.
> + * Open for both read and write; new info will be appended to the file.
> + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
> + * do not propagate to the file until an explicit overwrite at the last.
> + * This preserves most aspects of consistency (all except .st_size)
> + * for simultaneous readers of the file while we are appending to it.
> + * However, multiple writers still are bad.  We choose not to use
> + * locking because it is expensive and the use case of kernel build
> + * makes multiple writers unlikely.
> + */
> +static void *mmap_file(char const *fname)
> +{
> +     void *addr;
> +
> +     fd_map = open(fname, O_RDWR);
> +     if (fd_map < 0 || fstat(fd_map, &sb) < 0)
> +             die(fname, "failed to open file");
> +
> +     if (!S_ISREG(sb.st_mode))
> +             die(NULL, "not a regular file: %s\n", fname);
> +
> +     addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
> +                 fd_map, 0);
> +
> +     mmap_failed = 0;
> +     if (addr == MAP_FAILED) {
> +             mmap_failed = 1;
> +             addr = umalloc(sb.st_size);
> +             uread(fd_map, addr, sb.st_size);
> +     }
> +     return addr;
> +}
> +
> +static void munmap_file(void *addr)
> +{
> +     if (!mmap_failed)
> +             munmap(addr, sb.st_size);
> +     else
> +             free(addr);
> +     close(fd_map);
> +}
> +
> +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
> +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 };
> +static unsigned char *ideal_nop;
> +
> +static int (*make_nop)(void *map, size_t const offset);
> +
> +static int make_nop_x86(void *map, size_t const offset)
> +{
> +     unsigned char *op;
> +     unsigned char *nop;
> +     int size;
> +
> +     /* Determine which type of jmp this is 2 byte or 5. */
> +     op = map + offset;
> +     switch (*op) {
> +     case 0xeb: /* 2 byte */
> +             size = 2;
> +             nop = ideal_nop2_x86;
> +             break;
> +     case 0xe9: /* 5 byte */
> +             size = 5;
> +             nop = ideal_nop;
> +             break;
> +     default:
> +             die(NULL, "Bad jump label section\n");
> +     }
> +
> +     /* convert to nop */
> +     ulseek(fd_map, offset, SEEK_SET);
> +     uwrite(fd_map, nop, size);
> +     return 0;
> +}
> +
> +/* 32 bit and 64 bit are very similar */
> +#include "update_jump_label.h"
> +#define UPDATE_JUMP_LABEL_64
> +#include "update_jump_label.h"
> +
> +static int do_file(const char *fname)
> +{
> +     Elf32_Ehdr *const ehdr = mmap_file(fname);
> +     unsigned int reltype = 0;
> +
> +     w = w4nat;
> +     w2 = w2nat;
> +     w8 = w8nat;
> +     switch (ehdr->e_ident[EI_DATA]) {
> +             static unsigned int const endian = 1;
> +     default:
> +             die(NULL, "unrecognized ELF data encoding %d: %s\n",
> +                     ehdr->e_ident[EI_DATA], fname);
> +             break;
> +     case ELFDATA2LSB:
> +             if (*(unsigned char const *)&endian != 1) {
> +                     /* main() is big endian, file.o is little endian. */
> +                     w = w4rev;
> +                     w2 = w2rev;
> +                     w8 = w8rev;
> +             }
> +             break;
> +     case ELFDATA2MSB:
> +             if (*(unsigned char const *)&endian != 0) {
> +                     /* main() is little endian, file.o is big endian. */
> +                     w = w4rev;
> +                     w2 = w2rev;
> +                     w8 = w8rev;
> +             }
> +             break;
> +     }  /* end switch */
> +
> +     if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
> +         w2(ehdr->e_type) != ET_REL ||
> +         ehdr->e_ident[EI_VERSION] != EV_CURRENT)
> +             die(NULL, "unrecognized ET_REL file %s\n", fname);
> +
> +     switch (w2(ehdr->e_machine)) {
> +     default:
> +             die(NULL, "unrecognized e_machine %d %s\n",
> +                 w2(ehdr->e_machine), fname);
> +             break;
> +     case EM_386:
> +             reltype = R_386_32;
> +             make_nop = make_nop_x86;
> +             ideal_nop = ideal_nop5_x86_32;
> +             break;
> +     case EM_ARM:     reltype = R_ARM_ABS32;
> +                      break;
> +     case EM_IA_64:   reltype = R_IA64_IMM64; break;
> +     case EM_MIPS:    /* reltype: e_class    */ break;
> +     case EM_PPC:     reltype = R_PPC_ADDR32;   break;
> +     case EM_PPC64:   reltype = R_PPC64_ADDR64; break;
> +     case EM_S390:    /* reltype: e_class    */ break;
> +     case EM_SH:      reltype = R_SH_DIR32;                 break;
> +     case EM_SPARCV9: reltype = R_SPARC_64;     break;
> +     case EM_X86_64:
> +             make_nop = make_nop_x86;
> +             ideal_nop = ideal_nop5_x86_64;
> +             reltype = R_X86_64_64;
> +             break;
> +     }  /* end switch */
> +
> +     switch (ehdr->e_ident[EI_CLASS]) {
> +     default:
> +             die(NULL, "unrecognized ELF class %d %s\n",
> +                 ehdr->e_ident[EI_CLASS], fname);
> +             break;
> +     case ELFCLASS32:
> +             if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
> +             ||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
> +                     die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +             if (w2(ehdr->e_machine) == EM_S390) {
> +                     reltype = R_390_32;
> +             }
> +             if (w2(ehdr->e_machine) == EM_MIPS) {
> +                     reltype = R_MIPS_32;
> +             }
> +             do_func32(ehdr, fname, reltype);
> +             break;
> +     case ELFCLASS64: {
> +             Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
> +             if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
> +             ||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
> +                     die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +             if (w2(ghdr->e_machine) == EM_S390)
> +                     reltype = R_390_64;
> +
> +#if 0
> +             if (w2(ghdr->e_machine) == EM_MIPS) {
> +                     reltype = R_MIPS_64;
> +                     Elf64_r_sym = MIPS64_r_sym;
> +             }
> +#endif
> +             do_func64(ghdr, fname, reltype);
> +             break;
> +     }
> +     }  /* end switch */
> +
> +     munmap_file(ehdr);
> +     return 0;
> +}
> +
> +int main (int argc, char **argv)
> +{
> +     if (argc != 2)
> +             usage(argv);
> +     
> +     return do_file(argv[1]);
> +}
> +
> diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
> new file mode 100644
> index 0000000..6ff9846
> --- /dev/null
> +++ b/scripts/update_jump_label.h
> @@ -0,0 +1,322 @@
> +/*
> + * recordmcount.h
> + *
> + * This code was taken out of recordmcount.c written by
> + * Copyright 2009 John F. Reiser <jreiser@xxxxxxxxxxxx>.  All rights 
> reserved.
> + *
> + * The original code had the same algorithms for both 32bit
> + * and 64bit ELF files, but the code was duplicated to support
> + * the difference in structures that were used. This
> + * file creates a macro of everything that is different between
> + * the 64 and 32 bit code, such that by including this header
> + * twice we can create both sets of functions by including this
> + * header once with RECORD_MCOUNT_64 undefined, and again with
> + * it defined.
> + *
> + * This conversion to macros was done by:
> + * Copyright 2010 Steven Rostedt <srostedt@xxxxxxxxxx>, Red Hat Inc.
> + *
> + * Licensed under the GNU General Public License, version 2 (GPLv2).
> + */
> +
> +#undef EBITS
> +#undef _w
> +#undef _align
> +#undef _size
> +
> +#ifdef UPDATE_JUMP_LABEL_64
> +# define EBITS                       64
> +# define _w                  w8
> +# define _align                      7u
> +# define _size                       8
> +#else
> +# define EBITS                       32
> +# define _w                  w
> +# define _align                      3u
> +# define _size                       4
> +#endif
> +
> +#define _FBITS(x, e) x##e
> +#define FBITS(x, e)  _FBITS(x,e)
> +#define FUNC(x)              FBITS(x,EBITS)
> +
> +#undef Elf_Addr
> +#undef Elf_Ehdr
> +#undef Elf_Shdr
> +#undef Elf_Rel
> +#undef Elf_Rela
> +#undef Elf_Sym
> +#undef ELF_R_SYM
> +#undef ELF_R_TYPE
> +
> +#define __ATTACH(x,y,z)      x##y##z
> +#define ATTACH(x,y,z)        __ATTACH(x,y,z)
> +
> +#define Elf_Addr     ATTACH(Elf,EBITS,_Addr)
> +#define Elf_Ehdr     ATTACH(Elf,EBITS,_Ehdr)
> +#define Elf_Shdr     ATTACH(Elf,EBITS,_Shdr)
> +#define Elf_Rel              ATTACH(Elf,EBITS,_Rel)
> +#define Elf_Rela     ATTACH(Elf,EBITS,_Rela)
> +#define Elf_Sym              ATTACH(Elf,EBITS,_Sym)
> +#define uint_t               ATTACH(uint,EBITS,_t)
> +#define ELF_R_SYM    ATTACH(ELF,EBITS,_R_SYM)
> +#define ELF_R_TYPE   ATTACH(ELF,EBITS,_R_TYPE)
> +
> +#undef get_shdr
> +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
> +
> +#undef get_section_loc
> +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
> +
> +/* Functions and pointers that do_file() may override for specific 
> e_machine. */
> +
> +#if 0
> +static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp)
> +{
> +     return ELF_R_SYM(_w(rp->r_info));
> +}
> +static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM);
> +#endif
> +
> +static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr,
> +                              Elf_Ehdr const *const ehdr,
> +                              Elf_Sym const **sym0,
> +                              char const **str0,
> +                              Elf_Rel const **relp)
> +{
> +     Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +     unsigned const symsec_sh_link = w(relhdr->sh_link);
> +     Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
> +     Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
> +     Elf_Rel const *const rel0 =
> +             (Elf_Rel const *)get_section_loc(ehdr, relhdr);
> +
> +     *sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec);
> +
> +     *str0 = (char const *)get_section_loc(ehdr, strsec);
> +
> +     *relp = rel0;
> +}
> +
> +/*
> + * Read the relocation table again, but this time its called on sections
> + * that are not going to be traced. The mcount calls here will be converted
> + * into nops.
> + */
> +static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr,
> +                    Elf_Ehdr const *const ehdr,
> +                    const char *const txtname)
> +{
> +     Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +     Elf_Sym const *sym0;
> +     char const *str0;
> +     Elf_Rel const *relp;
> +     Elf_Rela const *relap;
> +     Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
> +     unsigned rel_entsize = w(relhdr->sh_entsize);
> +     unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
> +     int t;
> +
> +     FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp);
> +
> +     for (t = nrel; t > 0; t -= 3) {
> +             int ret = -1;
> +
> +             relap = (Elf_Rela const *)relp;
> +             printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n",
> +                    (long)relap->r_offset, (long)relap->r_info,
> +                    (long)ELF_R_SYM(relap->r_info),
> +                    (long)ELF_R_TYPE(relap->r_info),
> +                    (long)relap->r_addend);
> +
> +             if (0 && make_nop)
> +                     ret = make_nop((void *)ehdr, shdr->sh_offset + 
> relp->r_offset);
> +
> +             /* jump label sections are paired in threes */
> +             relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp);
> +     }
> +}
> +
> +/* Evade ISO C restriction: no declaration after statement in 
> has_rel_mcount. */
> +static char const *
> +FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr,  /* is SHT_REL or 
> SHT_RELA */
> +              Elf_Shdr const *const shdr0,
> +              char const *const shstrtab,
> +              char const *const fname)
> +{
> +     /* .sh_info depends on .sh_type == SHT_REL[,A] */
> +     Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
> +     char const *const txtname = &shstrtab[w(txthdr->sh_name)];
> +
> +     if (strcmp("__jump_table", txtname) == 0) {
> +             fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
> +                     fname);
> +//           succeed_file();
> +     }
> +     if (w(txthdr->sh_type) != SHT_PROGBITS ||
> +         !(w(txthdr->sh_flags) & SHF_EXECINSTR))
> +             return NULL;
> +     return txtname;
> +}
> +
> +static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr,
> +                                   Elf_Shdr const *const shdr0,
> +                                   char const *const shstrtab,
> +                                   char const *const fname)
> +{
> +     if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
> +             return NULL;
> +     return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname);
> +}
> +
> +/* Find relocation section hdr for a given section */
> +static const Elf_Shdr *
> +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
> +{
> +     const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +     int nhdr = w2(ehdr->e_shnum);
> +     const Elf_Shdr *hdr;
> +     int i;
> +
> +     for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +             if (w(hdr->sh_type) != SHT_REL &&
> +                 w(hdr->sh_type) != SHT_RELA)
> +                     continue;
> +
> +             /*
> +              * The relocation section's info field holds
> +              * the section index that it represents.
> +              */
> +             if (shdr == &shdr0[w(hdr->sh_info)])
> +                     return hdr;
> +     }
> +     return NULL;
> +}
> +
> +/* Find a section headr based on name and type */
> +static const Elf_Shdr *
> +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
> +{
> +     const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +     const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
> +     const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
> +     int nhdr = w2(ehdr->e_shnum);
> +     const Elf_Shdr *hdr;
> +     const char *hdrname;
> +     int i;
> +
> +     for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +             if (w(hdr->sh_type) != type)
> +                     continue;
> +
> +             /* If we are just looking for a section by type (ie. SYMTAB) */
> +             if (!name)
> +                     return hdr;
> +
> +             hdrname = &shstrtab[w(hdr->sh_name)];
> +             if (strcmp(hdrname, name) == 0)
> +                     return hdr;
> +     }
> +     return NULL;
> +}
> +
> +static void
> +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
> +                  unsigned shtype, const Elf_Rel *rel, void *data)
> +{
> +     const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +     const Elf_Shdr *targethdr;
> +     const Elf_Rela *rela;
> +     const Elf_Sym *syment;
> +     uint_t offset = _w(rel->r_offset);
> +     uint_t info = _w(rel->r_info);
> +     uint_t sym = ELF_R_SYM(info);
> +     uint_t type = ELF_R_TYPE(info);
> +     uint_t addend;
> +     uint_t targetloc;
> +
> +     if (shtype == SHT_RELA) {
> +             rela = (const Elf_Rela *)rel;
> +             addend = _w(rela->r_addend);
> +     } else
> +             addend = _w(*(unsigned short *)(data + offset));
> +
> +     syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
> +     targethdr = &shdr0[w2(syment[sym].st_shndx)];
> +     targetloc = _w(targethdr->sh_offset);
> +
> +     /* TODO, need a separate function for all archs */
> +     if (type != R_386_32)
> +             die(NULL, "Arch relocation type %d not supported", type);
> +
> +     targetloc += addend;
> +
> +#if 1
> +     printf("offset=%x target=%x shoffset=%x add=%x\n",
> +            offset, targetloc, _w(targethdr->sh_offset), addend);
> +#endif
> +     *(uint_t *)(data + offset) = targetloc;
> +}
> +
> +/* Overall supervision for Elf32 ET_REL file. */
> +static void
> +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const 
> reltype)
> +{
> +     const Elf_Shdr *jlshdr;
> +     const Elf_Shdr *jlrhdr;
> +     const Elf_Shdr *symhdr;
> +     const Elf_Rel *rel;
> +     unsigned size;
> +     unsigned cnt;
> +     unsigned i;
> +     uint_t type;
> +     void *jdata;
> +     void *data;
> +
> +     jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
> +     if (!jlshdr)
> +             return;
> +
> +     jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
> +     if (!jlrhdr)
> +             return;
> +
> +     /*
> +      * Create and fill in the __jump_table section and use it to
> +      * find the offsets into the text that we want to update.
> +      * We create it so that we do not depend on the order of the
> +      * relocations, and use the table directly, as it is broken
> +      * up into sections.
> +      */
> +     size = _w(jlshdr->sh_size);
> +     data = umalloc(size);
> +
> +     jdata = (void *)get_section_loc(ehdr, jlshdr);
> +     memcpy(data, jdata, size);
> +
> +     cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
> +
> +     rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
> +
> +     /* Is this as Rel or Rela? */
> +     type = w(jlrhdr->sh_type);
> +
> +     symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
> +
> +     for (i = 0; i < cnt; i++) {
> +             FUNC(section_update)(ehdr, symhdr, type, rel, data);
> +             rel = (void *)rel + w(jlrhdr->sh_entsize);
> +     }
> +
> +     /*
> +      * This is specific to x86. The jump_table is stored in three
> +      * long words. The first is the location of the jmp target we
> +      * must update.
> +      */
> +     cnt = size / sizeof(uint_t);
> +
> +     for (i = 0; i < cnt; i += 3)
> +             if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * 
> sizeof(uint_t)));
> +

hmmmm, isn't this the line that actually writes in the no-ops? why isn't
it disabled?

> +     free(data);
> +}
> 
> 

Thanks again for doing this...I was still understanding recordmcount.c ;)

-Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.