Re: [Xen-devel] pvgrub boot problems

On 03/27/2010 03:48 PM, M A Young wrote:

On Sat, 27 Mar 2010, Jeremy Fitzhardinge wrote:
I see the same thing, though mostly with 64/64/64 xen/dom0/guestboots. AFAIK Samuel has not been able to repro the problem, even whenusing my kernel and pvgrub images. I sent him a complete image of my/boot; with luck that will work...
Looking at the logs I think the cases might be slightly different,depending on the vfb settings in the guest configuration file.I don't have any vfb setting as I was using it text-only, and I amwondering if pvgrub crashes in these circumstances because it doesn'thave a vkbd and fails when it tries to clean up a non-existent device.

I was wondering if it depends on vfb or not as well. I haven't reallygot it to work either way, but they do have different failures:

Without vfb the crash is in the "kbdfront" thread (looks like a fairlystraightforward NULL pointer dereference):


sh-4.0# xm create -c f13pv64
Using config file "/etc/xen/f13pv64".
Started domain f13pv64 (id=1)
                             Xen Minimal OS!
  start_info: 0xa99000(VA)
    nr_pages: 0x20000
  shared_inf: 0xbf450000(MA)
     pt_base: 0xa9c000(VA)
nr_pt_frames: 0x9
    mfn_list: 0x999000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: (hd0,0)/grub/grub.conf
  stack:      0x958980-0x978980
MM: Init
      _text: 0x0(VA)
     _etext: 0x691c4(VA)
   _erodata: 0x82000(VA)
     _edata: 0x8aae0(VA)
  Booting 'Xen 2.6.32'

root (hd0,0)
Error ENOENT when reading the backend path device/vkbd/0/backend
Page fault at linear address 0x0, rip 0x3a28, regs 0xcfff18, sp 0xcfffc8, 
our_sp 0xcffed0, code 0
Thread: kbdfront
RIP: e030:[<0000000000003a28>]
RSP: e02b:0000000000cfffc8  EFLAGS: 00010006
RAX: 0000000000000000 RBX: 00000000000039da RCX: 000000000008a240
RDX: 0000002020004590 RSI: 000000000008a180 RDI: 000000000008a9a0
RBP: 0000000000cfffe8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000a9b000 R11: 0000000000071300 R12: 0000000000cdfe28
R13: 0000000000000001 R14: 0000000000000000 R15: 00000002540be400
base is 0xcfffe8 caller is 0x33da

cfffb0: c8 ff cf 00 00 00 00 00 2b e0 00 00 00 00 00 00
cfffc0: f7 39 00 00 00 00 00 00 da 39 00 00 00 00 00 00
cfffd0: 90 45 00 20 20 00 00 00 e3 8b c4 67 0f 48 00 00
cfffe0: e3 a7 b8 13 0d 48 00 00 00 00 00 00 00 00 00 00

cfffd0: 90 45 00 20 20 00 00 00 e3 8b c4 67 0f 48 00 00
cfffe0: e3 a7 b8 13 0d 48 00 00 00 00 00 00 00 00 00 00
cffff0: da 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d00000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

3a10: 04 24 01 48 8b 05 46 55 99 00 44 0f b6 68 01 c6
3a20: 40 01 01 49 8b 44 24 08 48 8b 18 49 83 c4 08 49
3a30: 39 c4 74 19 48 8b 78 f8 e8 85 b1 04 00 48 8b 13
3a40: 49 39 dc 74 08 48 89 d8 48 89 d3 eb e7 48 8b 05
Pagetable walk from virt 0, base a9c000:
 L4 = 000000013446d067 (0xa9d000)  [offset = 0]
  L3 = 000000013446c067 (0xa9e000)  [offset = 0]
   L2 = 000000013446b067 (0xa9f000)  [offset = 0]
    L1 = 0000000000000000 [offset = 0]


The rip in this case corresponds to:

(gdb) list *0x0000000000003a28
0x3a28 is in kbd_thread 
(/home/jeremy/hg/xen/unstable/stubdom/../extras/mini-os/include/mini-os/wait.h:43).
38      static inline void wake_up(struct wait_queue_head *head)
39      {
40          unsigned long flags;
41          struct minios_list_head *tmp, *next;
42          local_irq_save(flags);
43          minios_list_for_each_safe(tmp, next,&head->thread_list)
44          {
45               struct wait_queue *curr;
46               curr = minios_list_entry(tmp, struct wait_queue, thread_list);
47               wake(curr->thread);



But with vfb enabled, I get a crash in "main":

sh-4.0# xm create -c f13pv64
Using config file "/etc/xen/f13pv64".
Started domain f13pv64 (id=2)
                             Xen Minimal OS!
  start_info: 0xa99000(VA)
    nr_pages: 0x20000
  shared_inf: 0xbf450000(MA)
     pt_base: 0xa9c000(VA)
nr_pt_frames: 0x9
    mfn_list: 0x999000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: (hd0,0)/grub/grub.conf
  stack:      0x958980-0x978980
MM: Init
      _text: 0x0(VA)
     _etext: 0x691c4(VA)
   _erodata: 0x82000(VA)
     _edata: 0x8aae0(VA)
stack start: 0x958980(VA)
       _end: 0x998f88(VA)
  start_pfn: aa8
    max_pfn: 20000
Mapping memory range 0xc00000 - 0x20000000
setting 0x0-0x82000 readonly
skipped 0x1000
MM: Initialise page allocator for ba2000(ba2000)-20000000(20000000)
MM: done
Demand map pfns at 20001000-2020001000.
Heap resides at 2020002000-4020002000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x20001000.
Initialising scheduler
Thread "Idle": pointer: 0x2020002050, stack: 0xcb0000
Initialising xenbus
Thread "xenstore": pointer: 0x2020002800, stack: 0xcc0000
Dummy main: start_info=0x978a80
Thread "main": pointer: 0x2020002fb0, stack: 0xcd0000
Thread "pcifront": pointer: 0x2020003760, stack: 0xce0000
"main" "(hd0,0)/grub/grub.conf"
pcifront_watches: waiting for backend path to happear device/pci/0/backend
vbd 51712 is hd0
******************* BLKFRONT for device/vbd/51712 **********


backend at /local/domain/0/backend/vbd/2/51712
Failed to read /local/domain/0/backend/vbd/2/51712/feature-flush-cache.
20971520 sectors of 512 bytes
**************************
Thread "kbdfront": pointer: 0x2020004590, stack: 0xcf0000
******************* FBFRONT for device/vfb/0 **********


******************* KBDFRONT for device/vkbd/0 **********


backend at /local/domain/0/backend/vkbd/2/0
/local/domain/0/backend/vkbd/2/0 connected
************************** KBDFRONT
Thread "kbdfront" exited.
backend at /local/domain/0/backend/vfb/2/0
/local/domain/0/backend/vfb/2/0 connected
************************** FBFRONT
Thread "kbdfront close": pointer: 0x2020004590, stack: 0xcf0000
close fb: backend at /local/domain/0/backend/vfb/2/0
close kbd: backend at /local/domain/0/backend/vkbd/2/0
shutdown_kbdfront: error changing state to 5: ENOENT
Thread "kbdfront close" exited.
  Booting 'Xen 2.6.32'

root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /xen.gz console=com1,vga
Page fault at linear address 0x100953340, rip 0x4b798, regs 0xcdfa68, sp 
0xcdfb18, our_sp 0xcdfa20, code 2
Thread: main
RIP: e030:[<000000000004b798>]
RSP: e02b:0000000000cdfb18  EFLAGS: 00010293
RAX: 0000000000e2e000 RBX: 0000000000cdfb98 RCX: 0000000100953338
RDX: 0000000000000001 RSI: 0000000000953188 RDI: 0000000000000000
RBP: 0000000000cdfb28 R08: 0000000100953338 R09: 0000000000e2c000
R10: 0000000000007ff0 R11: 0000000000200030 R12: 0000000000000003
R13: 0000002020304e18 R14: 0000000000000000 R15: 0000000000001200
base is 0xcdfb28 caller is 0x57ced
base is 0xcdfb88 caller is 0x3067
base is 0xcdfc88 caller is 0x5c1e3
base is 0xcdfcd8 caller is 0x5be3a
base is 0xcdfcf8 caller is 0x3db6
base is 0xcdfd38 caller is 0x402e
base is 0xcdfd58 caller is 0x8341
base is 0xcdfd98 caller is 0xaa2f
base is 0xcdfdd8 caller is 0x108ca
base is 0xcdfe88 caller is 0x10f62
base is 0xcdff48 caller is 0x4343
base is 0xcdff58 caller is 0x4b4ba
base is 0xcdffe8 caller is 0x33da

cdfb00: 18 fb cd 00 00 00 00 00 2b e0 00 00 00 00 00 00
cdfb10: 18 e8 0d 34 01 00 00 00 25 a0 d8 34 01 00 00 00
cdfb20: 98 fb cd 00 00 00 00 00 88 fb cd 00 00 00 00 00
cdfb30: ed 7c 05 00 00 00 00 00 25 a0 d8 34 01 00 00 00

cdfb10: 18 e8 0d 34 01 00 00 00 25 a0 d8 34 01 00 00 00
cdfb20: 98 fb cd 00 00 00 00 00 88 fb cd 00 00 00 00 00
cdfb30: ed 7c 05 00 00 00 00 00 25 a0 d8 34 01 00 00 00
cdfb40: 30 e8 0d 34 01 00 00 00 03 00 00 00 01 00 00 00

4b780: f2 b9 80 31 95 00 48 8b 04 f1 4c 8b 00 4c 89 04
4b790: f1 48 8b 08 48 8b 70 08 48 89 71 08 39 d7 74 49
4b7a0: be 01 00 00 00 41 b8 80 31 95 00 83 ea 01 8d 4a
4b7b0: 0c 48 89 f3 48 d3 e3 4c 8d 0c 18 41 89 51 10 4c
Pagetable walk from virt 100953340, base a9c000:
 L4 = 000000013446d067 (0xa9d000)  [offset = 0]
  L3 = 0000000000000000 (0xfffffffffffff000)  [offset = 4]
Page fault in pagetable walk (access to invalid memory?).


In this case the rip is:

(gdb) list *0x000000000004b798
0x4b798 is in alloc_pages (mm.c:276).
271         if ( i == FREELIST_SIZE ) goto no_memory;
272     
273         /* Unlink a chunk. */
274         alloc_ch = free_head[i];
275         free_head[i] = alloc_ch->next;
276         alloc_ch->next->pprev = alloc_ch->pprev;
277     
278         /* We may have to break the chunk a number of times. */
279         while ( i != order )
280         {

The stack backtrace maps to:

0x57ced:
handle_cow
/home/jeremy/hg/xen/unstable/extras/mini-os/arch/x86/traps.c:147
do_page_fault
/home/jeremy/hg/xen/unstable/extras/mini-os/arch/x86/traps.c:202
0x3067:
error_call_handler
??:0
0x5c1e3:
_realloc_r
/home/jeremy/hg/xen/unstable/stubdom/newlib-x86_64/x86_64-xen-elf/newlib/libc/stdlib/../../../../../newlib-1.16.0/newlib/libc/stdlib/mallocr.c:2947
0x5be3a:
realloc
/home/jeremy/hg/xen/unstable/stubdom/newlib-x86_64/x86_64-xen-elf/newlib/libc/stdlib/../../../../../newlib-1.16.0/newlib/libc/stdlib/realloc.c:19
0x3db6:
load_file
/home/jeremy/hg/xen/unstable/stubdom/grub/mini-os.c:165
0x402e:
load_image
/home/jeremy/hg/xen/unstable/stubdom/grub/mini-os.c:187
0x8341:
kernel_func
/home/jeremy/hg/xen/unstable/stubdom/grub/../grub-upstream/stage2/builtins.c:2713
0xaa2f:
run_script
/home/jeremy/hg/xen/unstable/stubdom/grub/../grub-upstream/stage2/cmdline.c:256
0x108ca:
run_menu
/home/jeremy/hg/xen/unstable/stubdom/grub/../grub-upstream/stage2/stage2.c:769
0x10f62:
cmain
/home/jeremy/hg/xen/unstable/stubdom/grub/../grub-upstream/stage2/stage2.c:1121
0x4343:
main
/home/jeremy/hg/xen/unstable/stubdom/grub/mini-os.c:763
0x4b4ba:
call_main
/home/jeremy/hg/xen/unstable/extras/mini-os/main.c:162
0x33da:
thread_starter
gdtoa-hexnan.c:0

Which looks to me like something about the kernel bzImage is upsetting it.

        J


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] pvgrub boot problems