[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel panic with 2.6.32-30 under network activity


  • To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • From: Olivier Hanesse <olivier.hanesse@xxxxxxxxx>
  • Date: Wed, 16 Mar 2011 10:35:08 +0100
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Xen Users <xen-users@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 16 Mar 2011 02:37:52 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=MzbxUaXUbSd75gR8Xxwct4eOCQPgCv3Y1Ll3/M2GDOw5TNCq3P6YcLFMLIGM/cmK5D PahG1SytWv+fo6ORtKSco6uUPtvWlOYOIPqj1wB+dJxxC8VMW+fz01+DZ+4yU98+N3QM a4ZTmpO+BqxdkZ2+HDatFP1bNB1gMd/slpkkk=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hello,

Yes, this bug happens quite often.

About my CPU, I am using : 

model name      : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz

There is no log at all before this message on the domU. I got this message from xen console. 

This guest isn't pinned to a specific cpu : 

Name             ID  VCPU   CPU State   Time(s) CPU Affinity
Domain-0         0     0     0   r--   18098.1 0
domU             15     0     1   -b-    3060.8 any cpu
domU             15     1     4   -b-    1693.4 any cpu

My dom0 is pinned : 

release                : 2.6.32-bpo.5-xen-amd64
version                : #1 SMP Mon Jan 17 22:05:11 UTC 2011
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2493
hw_caps                : bfebfbff:20000800:00000000:00000940:000ce3bd:00000000:00000001:00000000
virt_caps              : hvm
total_memory           : 10239
free_memory            : 405
node_to_cpu            : node0:0-7
node_to_memory         : node0:405
node_to_dma32_mem      : node0:405
max_node_id            : 0
xen_major              : 4
xen_minor              : 0
xen_extra              : .1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 clocksource=pit cpuidle=0
cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-10) 
cc_compile_by          : waldi
cc_compile_domain      : debian.org
cc_compile_date        : Wed Jan 12 14:04:06 UTC 2011
xend_config_format     : 4

I was running top/vmstat before this crash, I saw nothing strange (kernel not swapping, no load, not a lot of IOs ... just a network rsync).

About log in Dom0, in "xm dmesg"

(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) grant_table.c:204:d0 Increased maptrack size to 2 frames.
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935

I don't know if this is relevant or not. I will check at the next kernel panic, if another line is appended.

Hope this helps.

Olivier


2011/3/16 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
On Thu, Mar 10, 2011 at 12:25:55PM +0100, Olivier Hanesse wrote:
> Hello,
>
> I've got several kernel panic on a domU under network activity (multiple
> rsync using rsh). I didn't manage to reproduce it manually, but it happened
> 5times during the last month.

Does it happend all the time?
> Each time, it is the same kernel trace.
>
> I am using Debian 5.0.8 with kernel/hypervisor :
>
> ii  linux-image-2.6.32-bpo.5-amd64      2.6.32-30~bpo50+1   Linux 2.6.32 for
> 64-bit PCs
> ii  xen-hypervisor-4.0-amd64               4.0.1-2                     The
> Xen Hypervisor on AMD64
>
> Here is the  trace :
>
> [469390.126691] alignment check: 0000 [#1] SMP

aligment check? Was there anything else in the log before this? Was there
anything in the Dom0 log?

> [469390.126711] last sysfs file: /sys/devices/virtual/net/lo/operstate
> [469390.126718] CPU 0
> [469390.126725] Modules linked in: snd_pcsp xen_netfront snd_pcm evdev
> snd_timer snd soundcore snd_page_alloc ext3 jbd mbcache dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod xen_blkfront thermal_sys
> [469390.126772] Pid: 22077, comm: rsh Not tainted 2.6.32-bpo.5-amd64 #1
> [469390.126779] RIP: e030:[<ffffffff8126093d>]  [<ffffffff8126093d>]
> eth_header+0x61/0x9c
> [469390.126795] RSP: e02b:ffff88001ec3f9b8  EFLAGS: 00050286
> [469390.126802] RAX: 00000000090f0900 RBX: 0000000000000008 RCX:
> ffff88001ecd0cee
> [469390.126811] RDX: 0000000000000800 RSI: 000000000000000e RDI:
> ffff88001ecd0cee
> [469390.126820] RBP: ffff8800029016d0 R08: 0000000000000000 R09:
> 0000000000000034
> [469390.126829] R10: 000000000000000e R11: ffffffff81255821 R12:
> ffff880002935144
> [469390.126838] R13: 0000000000000034 R14: ffff88001fe80000 R15:
> ffff88001fe80000
> [469390.126851] FS:  00007f340c2276e0(0000) GS:ffff880002f4d000(0000)
> knlGS:0000000000000000
> [469390.126860] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [469390.126867] CR2: 00007fffb8f33a8c CR3: 000000001d875000 CR4:
> 0000000000002660
> [469390.126877] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [469390.126886] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [469390.126895] Process rsh (pid: 22077, threadinfo ffff88001ec3e000, task
> ffff88001ea61530)
> [469390.126904] Stack:
> [469390.126908]  0000000000000000 0000000000000000 ffff88001ecd0cfc
> ffff88001f1a4ae8
> [469390.126921] <0> ffff880002935100 ffff880002935140 0000000000000000
> ffffffff81255a20
> [469390.126937] <0> 0000000000000000 ffffffff8127743d 0000000000000000
> ffff88001ecd0cfc
> [469390.126954] Call Trace:
> [469390.126963]  [<ffffffff81255a20>] ? neigh_resolve_output+0x1ff/0x284
> [469390.126974]  [<ffffffff8127743d>] ? ip_finish_output2+0x1d6/0x22b
> [469390.126983]  [<ffffffff8127708f>] ? ip_queue_xmit+0x311/0x386
> [469390.126994]  [<ffffffff8100dc35>] ? xen_force_evtchn_callback+0x9/0xa
> [469390.127003]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.127013]  [<ffffffff81287a47>] ? tcp_transmit_skb+0x648/0x687
> [469390.127022]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.127031]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.127040]  [<ffffffff81289ec9>] ? tcp_write_xmit+0x874/0x96c
> [469390.127049]  [<ffffffff8128a00e>] ? __tcp_push_pending_frames+0x22/0x53
> [469390.127059]  [<ffffffff8127d409>] ? tcp_close+0x176/0x3d0
> [469390.127069]  [<ffffffff81299f0c>] ? inet_release+0x4e/0x54
> [469390.127079]  [<ffffffff812410d1>] ? sock_release+0x19/0x66
> [469390.127087]  [<ffffffff81241140>] ? sock_close+0x22/0x26
> [469390.127097]  [<ffffffff810ef879>] ? __fput+0x100/0x1af
> [469390.127106]  [<ffffffff810eccb6>] ? filp_close+0x5b/0x62
> [469390.127116]  [<ffffffff8104f878>] ? put_files_struct+0x64/0xc1
> [469390.127127]  [<ffffffff812fbb02>] ? _spin_lock_irq+0x7/0x22
> [469390.127135]  [<ffffffff81051141>] ? do_exit+0x236/0x6c6
> [469390.127144]  [<ffffffff8100c241>] ?
> __raw_callee_save_xen_pud_val+0x11/0x1e
> [469390.127154]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.127163]  [<ffffffff8100c205>] ?
> __raw_callee_save_xen_pmd_val+0x11/0x1e
> [469390.127173]  [<ffffffff81051647>] ? do_group_exit+0x76/0x9d
> [469390.127183]  [<ffffffff8105dec1>] ? get_signal_to_deliver+0x318/0x343
> [469390.127193]  [<ffffffff8101004f>] ? do_notify_resume+0x87/0x73f
> [469390.127202]  [<ffffffff812fbf45>] ? page_fault+0x25/0x30
> [469390.127211]  [<ffffffff812fc17a>] ? error_exit+0x2a/0x60
> [469390.127219]  [<ffffffff8101151d>] ? retint_restore_args+0x5/0x6
> [469390.127228]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.127240]  [<ffffffff8119564d>] ? __put_user_4+0x1d/0x30
> [469390.128009]  [<ffffffff81010e0e>] ? int_signal+0x12/0x17
> [469390.128009] Code: 89 e8 86 e0 66 89 47 0c 48 85 ed 75 07 49 8b ae 20 02
> 00 00 8b 45 00 4d 85 e4 89 47 06 66 8b 45 04 66 89 47 0a 74 12 41 8b 04 24
> <89> 07 66 41 8b 44 24 04 66 89 47 04 eb 18 41 f6 86 60 01 00 00
> [469390.128009] RIP  [<ffffffff8126093d>] eth_header+0x61/0x9c
> [469390.128009]  RSP <ffff88001ec3f9b8>
> [469390.128009] ---[ end trace dd6b1396ef9d9a96 ]---
> [469390.128009] Kernel panic - not syncing: Fatal exception in interrupt
> [469390.128009] Pid: 22077, comm: rsh Tainted: G      D
>  2.6.32-bpo.5-amd64 #1
> [469390.128009] Call Trace:
> [469390.128009]  [<ffffffff812f9d03>] ? panic+0x86/0x143
> [469390.128009]  [<ffffffff812fbbca>] ? _spin_unlock_irqrestore+0xd/0xe
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff812fbbca>] ? _spin_unlock_irqrestore+0xd/0xe
> [469390.128009]  [<ffffffff8104e387>] ? release_console_sem+0x17e/0x1af
> [469390.128009]  [<ffffffff812fca65>] ? oops_end+0xa7/0xb4
> [469390.128009]  [<ffffffff81012416>] ? do_alignment_check+0x88/0x92
> [469390.128009]  [<ffffffff81011a75>] ? alignment_check+0x25/0x30
> [469390.128009]  [<ffffffff81255821>] ? neigh_resolve_output+0x0/0x284
> [469390.128009]  [<ffffffff8126093d>] ? eth_header+0x61/0x9c
> [469390.128009]  [<ffffffff81260900>] ? eth_header+0x24/0x9c
> [469390.128009]  [<ffffffff81255a20>] ? neigh_resolve_output+0x1ff/0x284
> [469390.128009]  [<ffffffff8127743d>] ? ip_finish_output2+0x1d6/0x22b
> [469390.128009]  [<ffffffff8127708f>] ? ip_queue_xmit+0x311/0x386
> [469390.128009]  [<ffffffff8100dc35>] ? xen_force_evtchn_callback+0x9/0xa
> [469390.128009]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.128009]  [<ffffffff81287a47>] ? tcp_transmit_skb+0x648/0x687
> [469390.128009]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff81289ec9>] ? tcp_write_xmit+0x874/0x96c
> [469390.128009]  [<ffffffff8128a00e>] ? __tcp_push_pending_frames+0x22/0x53
> [469390.128009]  [<ffffffff8127d409>] ? tcp_close+0x176/0x3d0
> [469390.128009]  [<ffffffff81299f0c>] ? inet_release+0x4e/0x54
> [469390.128009]  [<ffffffff812410d1>] ? sock_release+0x19/0x66
> [469390.128009]  [<ffffffff81241140>] ? sock_close+0x22/0x26
> [469390.128009]  [<ffffffff810ef879>] ? __fput+0x100/0x1af
> [469390.128009]  [<ffffffff810eccb6>] ? filp_close+0x5b/0x62
> [469390.128009]  [<ffffffff8104f878>] ? put_files_struct+0x64/0xc1
> [469390.128009]  [<ffffffff812fbb02>] ? _spin_lock_irq+0x7/0x22
> [469390.128009]  [<ffffffff81051141>] ? do_exit+0x236/0x6c6
> [469390.128009]  [<ffffffff8100c241>] ?
> __raw_callee_save_xen_pud_val+0x11/0x1e
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff8100c205>] ?
> __raw_callee_save_xen_pmd_val+0x11/0x1e
> [469390.128009]  [<ffffffff81051647>] ? do_group_exit+0x76/0x9d
> [469390.128009]  [<ffffffff8105dec1>] ? get_signal_to_deliver+0x318/0x343
> [469390.128009]  [<ffffffff8101004f>] ? do_notify_resume+0x87/0x73f
> [469390.128009]  [<ffffffff812fbf45>] ? page_fault+0x25/0x30
> [469390.128009]  [<ffffffff812fc17a>] ? error_exit+0x2a/0x60
> [469390.128009]  [<ffffffff8101151d>] ? retint_restore_args+0x5/0x6
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff8119564d>] ? __put_user_4+0x1d/0x30
> [469390.128009]  [<ffffffff81010e0e>] ? int_signal+0x12/0x17
>
> I found another post, which may be the same bug (same kernel, network
> activity ... ) :
>
> http://jira.mongodb.org/browse/SERVER-2383
>
> Any ideas ?

None.. What type of CPU do you have? Are you pinning your
guest to a specific CPU?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.