WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-devel] Kernel panic with 2.6.32-30 under network activity

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] Kernel panic with 2.6.32-30 under network activity
From: Olivier Hanesse <olivier.hanesse@xxxxxxxxx>
Date: Wed, 16 Mar 2011 10:35:08 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Xen Users <xen-users@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 16 Mar 2011 02:37:52 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=uTv02X/OUn+7c8qneboS2EwBaFmf8vDyqNMuHC3WxQk=; b=q1ax2D9qtk6Lf2WaQcpzampAXVsgCAM7+Oxghsp93Oih4MXw3Oggqtk11CWFFGyDbB MRzVl0A0WiFSFkAA94BsuzKOcCNZiAbdzYrIj2Fu4reVMKfoAdSP19UA7mVY5A4b0UrT tpSeWTVSPWXAux49yvqKZEAVHEHSuOM1nJ9w0=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=MzbxUaXUbSd75gR8Xxwct4eOCQPgCv3Y1Ll3/M2GDOw5TNCq3P6YcLFMLIGM/cmK5D PahG1SytWv+fo6ORtKSco6uUPtvWlOYOIPqj1wB+dJxxC8VMW+fz01+DZ+4yU98+N3QM a4ZTmpO+BqxdkZ2+HDatFP1bNB1gMd/slpkkk=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110316032018.GC7905@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimVdAG6y+-9jNuQM78Bz+O7CuBteQdF1yK1YYCo@xxxxxxxxxxxxxx> <20110316032018.GC7905@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hello,

Yes, this bug happens quite often.

About my CPU, I am using : 

model name      : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz

There is no log at all before this message on the domU. I got this message from xen console. 

This guest isn't pinned to a specific cpu : 

Name             ID  VCPU   CPU State   Time(s) CPU Affinity
Domain-0         0     0     0   r--   18098.1 0
domU             15     0     1   -b-    3060.8 any cpu
domU             15     1     4   -b-    1693.4 any cpu

My dom0 is pinned : 

release                : 2.6.32-bpo.5-xen-amd64
version                : #1 SMP Mon Jan 17 22:05:11 UTC 2011
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2493
hw_caps                : bfebfbff:20000800:00000000:00000940:000ce3bd:00000000:00000001:00000000
virt_caps              : hvm
total_memory           : 10239
free_memory            : 405
node_to_cpu            : node0:0-7
node_to_memory         : node0:405
node_to_dma32_mem      : node0:405
max_node_id            : 0
xen_major              : 4
xen_minor              : 0
xen_extra              : .1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 clocksource=pit cpuidle=0
cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-10) 
cc_compile_by          : waldi
cc_compile_domain      : debian.org
cc_compile_date        : Wed Jan 12 14:04:06 UTC 2011
xend_config_format     : 4

I was running top/vmstat before this crash, I saw nothing strange (kernel not swapping, no load, not a lot of IOs ... just a network rsync).

About log in Dom0, in "xm dmesg"

(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) grant_table.c:204:d0 Increased maptrack size to 2 frames.
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935
(XEN) traps.c:2869: GPF (0060): ffff82c48014efea -> ffff82c4801f9935

I don't know if this is relevant or not. I will check at the next kernel panic, if another line is appended.

Hope this helps.

Olivier


2011/3/16 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
On Thu, Mar 10, 2011 at 12:25:55PM +0100, Olivier Hanesse wrote:
> Hello,
>
> I've got several kernel panic on a domU under network activity (multiple
> rsync using rsh). I didn't manage to reproduce it manually, but it happened
> 5times during the last month.

Does it happend all the time?
> Each time, it is the same kernel trace.
>
> I am using Debian 5.0.8 with kernel/hypervisor :
>
> ii  linux-image-2.6.32-bpo.5-amd64      2.6.32-30~bpo50+1   Linux 2.6.32 for
> 64-bit PCs
> ii  xen-hypervisor-4.0-amd64               4.0.1-2                     The
> Xen Hypervisor on AMD64
>
> Here is the  trace :
>
> [469390.126691] alignment check: 0000 [#1] SMP

aligment check? Was there anything else in the log before this? Was there
anything in the Dom0 log?

> [469390.126711] last sysfs file: /sys/devices/virtual/net/lo/operstate
> [469390.126718] CPU 0
> [469390.126725] Modules linked in: snd_pcsp xen_netfront snd_pcm evdev
> snd_timer snd soundcore snd_page_alloc ext3 jbd mbcache dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod xen_blkfront thermal_sys
> [469390.126772] Pid: 22077, comm: rsh Not tainted 2.6.32-bpo.5-amd64 #1
> [469390.126779] RIP: e030:[<ffffffff8126093d>]  [<ffffffff8126093d>]
> eth_header+0x61/0x9c
> [469390.126795] RSP: e02b:ffff88001ec3f9b8  EFLAGS: 00050286
> [469390.126802] RAX: 00000000090f0900 RBX: 0000000000000008 RCX:
> ffff88001ecd0cee
> [469390.126811] RDX: 0000000000000800 RSI: 000000000000000e RDI:
> ffff88001ecd0cee
> [469390.126820] RBP: ffff8800029016d0 R08: 0000000000000000 R09:
> 0000000000000034
> [469390.126829] R10: 000000000000000e R11: ffffffff81255821 R12:
> ffff880002935144
> [469390.126838] R13: 0000000000000034 R14: ffff88001fe80000 R15:
> ffff88001fe80000
> [469390.126851] FS:  00007f340c2276e0(0000) GS:ffff880002f4d000(0000)
> knlGS:0000000000000000
> [469390.126860] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [469390.126867] CR2: 00007fffb8f33a8c CR3: 000000001d875000 CR4:
> 0000000000002660
> [469390.126877] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [469390.126886] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [469390.126895] Process rsh (pid: 22077, threadinfo ffff88001ec3e000, task
> ffff88001ea61530)
> [469390.126904] Stack:
> [469390.126908]  0000000000000000 0000000000000000 ffff88001ecd0cfc
> ffff88001f1a4ae8
> [469390.126921] <0> ffff880002935100 ffff880002935140 0000000000000000
> ffffffff81255a20
> [469390.126937] <0> 0000000000000000 ffffffff8127743d 0000000000000000
> ffff88001ecd0cfc
> [469390.126954] Call Trace:
> [469390.126963]  [<ffffffff81255a20>] ? neigh_resolve_output+0x1ff/0x284
> [469390.126974]  [<ffffffff8127743d>] ? ip_finish_output2+0x1d6/0x22b
> [469390.126983]  [<ffffffff8127708f>] ? ip_queue_xmit+0x311/0x386
> [469390.126994]  [<ffffffff8100dc35>] ? xen_force_evtchn_callback+0x9/0xa
> [469390.127003]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.127013]  [<ffffffff81287a47>] ? tcp_transmit_skb+0x648/0x687
> [469390.127022]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.127031]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.127040]  [<ffffffff81289ec9>] ? tcp_write_xmit+0x874/0x96c
> [469390.127049]  [<ffffffff8128a00e>] ? __tcp_push_pending_frames+0x22/0x53
> [469390.127059]  [<ffffffff8127d409>] ? tcp_close+0x176/0x3d0
> [469390.127069]  [<ffffffff81299f0c>] ? inet_release+0x4e/0x54
> [469390.127079]  [<ffffffff812410d1>] ? sock_release+0x19/0x66
> [469390.127087]  [<ffffffff81241140>] ? sock_close+0x22/0x26
> [469390.127097]  [<ffffffff810ef879>] ? __fput+0x100/0x1af
> [469390.127106]  [<ffffffff810eccb6>] ? filp_close+0x5b/0x62
> [469390.127116]  [<ffffffff8104f878>] ? put_files_struct+0x64/0xc1
> [469390.127127]  [<ffffffff812fbb02>] ? _spin_lock_irq+0x7/0x22
> [469390.127135]  [<ffffffff81051141>] ? do_exit+0x236/0x6c6
> [469390.127144]  [<ffffffff8100c241>] ?
> __raw_callee_save_xen_pud_val+0x11/0x1e
> [469390.127154]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.127163]  [<ffffffff8100c205>] ?
> __raw_callee_save_xen_pmd_val+0x11/0x1e
> [469390.127173]  [<ffffffff81051647>] ? do_group_exit+0x76/0x9d
> [469390.127183]  [<ffffffff8105dec1>] ? get_signal_to_deliver+0x318/0x343
> [469390.127193]  [<ffffffff8101004f>] ? do_notify_resume+0x87/0x73f
> [469390.127202]  [<ffffffff812fbf45>] ? page_fault+0x25/0x30
> [469390.127211]  [<ffffffff812fc17a>] ? error_exit+0x2a/0x60
> [469390.127219]  [<ffffffff8101151d>] ? retint_restore_args+0x5/0x6
> [469390.127228]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.127240]  [<ffffffff8119564d>] ? __put_user_4+0x1d/0x30
> [469390.128009]  [<ffffffff81010e0e>] ? int_signal+0x12/0x17
> [469390.128009] Code: 89 e8 86 e0 66 89 47 0c 48 85 ed 75 07 49 8b ae 20 02
> 00 00 8b 45 00 4d 85 e4 89 47 06 66 8b 45 04 66 89 47 0a 74 12 41 8b 04 24
> <89> 07 66 41 8b 44 24 04 66 89 47 04 eb 18 41 f6 86 60 01 00 00
> [469390.128009] RIP  [<ffffffff8126093d>] eth_header+0x61/0x9c
> [469390.128009]  RSP <ffff88001ec3f9b8>
> [469390.128009] ---[ end trace dd6b1396ef9d9a96 ]---
> [469390.128009] Kernel panic - not syncing: Fatal exception in interrupt
> [469390.128009] Pid: 22077, comm: rsh Tainted: G      D
>  2.6.32-bpo.5-amd64 #1
> [469390.128009] Call Trace:
> [469390.128009]  [<ffffffff812f9d03>] ? panic+0x86/0x143
> [469390.128009]  [<ffffffff812fbbca>] ? _spin_unlock_irqrestore+0xd/0xe
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff812fbbca>] ? _spin_unlock_irqrestore+0xd/0xe
> [469390.128009]  [<ffffffff8104e387>] ? release_console_sem+0x17e/0x1af
> [469390.128009]  [<ffffffff812fca65>] ? oops_end+0xa7/0xb4
> [469390.128009]  [<ffffffff81012416>] ? do_alignment_check+0x88/0x92
> [469390.128009]  [<ffffffff81011a75>] ? alignment_check+0x25/0x30
> [469390.128009]  [<ffffffff81255821>] ? neigh_resolve_output+0x0/0x284
> [469390.128009]  [<ffffffff8126093d>] ? eth_header+0x61/0x9c
> [469390.128009]  [<ffffffff81260900>] ? eth_header+0x24/0x9c
> [469390.128009]  [<ffffffff81255a20>] ? neigh_resolve_output+0x1ff/0x284
> [469390.128009]  [<ffffffff8127743d>] ? ip_finish_output2+0x1d6/0x22b
> [469390.128009]  [<ffffffff8127708f>] ? ip_queue_xmit+0x311/0x386
> [469390.128009]  [<ffffffff8100dc35>] ? xen_force_evtchn_callback+0x9/0xa
> [469390.128009]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.128009]  [<ffffffff81287a47>] ? tcp_transmit_skb+0x648/0x687
> [469390.128009]  [<ffffffff8100e242>] ? check_events+0x12/0x20
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff81289ec9>] ? tcp_write_xmit+0x874/0x96c
> [469390.128009]  [<ffffffff8128a00e>] ? __tcp_push_pending_frames+0x22/0x53
> [469390.128009]  [<ffffffff8127d409>] ? tcp_close+0x176/0x3d0
> [469390.128009]  [<ffffffff81299f0c>] ? inet_release+0x4e/0x54
> [469390.128009]  [<ffffffff812410d1>] ? sock_release+0x19/0x66
> [469390.128009]  [<ffffffff81241140>] ? sock_close+0x22/0x26
> [469390.128009]  [<ffffffff810ef879>] ? __fput+0x100/0x1af
> [469390.128009]  [<ffffffff810eccb6>] ? filp_close+0x5b/0x62
> [469390.128009]  [<ffffffff8104f878>] ? put_files_struct+0x64/0xc1
> [469390.128009]  [<ffffffff812fbb02>] ? _spin_lock_irq+0x7/0x22
> [469390.128009]  [<ffffffff81051141>] ? do_exit+0x236/0x6c6
> [469390.128009]  [<ffffffff8100c241>] ?
> __raw_callee_save_xen_pud_val+0x11/0x1e
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff8100c205>] ?
> __raw_callee_save_xen_pmd_val+0x11/0x1e
> [469390.128009]  [<ffffffff81051647>] ? do_group_exit+0x76/0x9d
> [469390.128009]  [<ffffffff8105dec1>] ? get_signal_to_deliver+0x318/0x343
> [469390.128009]  [<ffffffff8101004f>] ? do_notify_resume+0x87/0x73f
> [469390.128009]  [<ffffffff812fbf45>] ? page_fault+0x25/0x30
> [469390.128009]  [<ffffffff812fc17a>] ? error_exit+0x2a/0x60
> [469390.128009]  [<ffffffff8101151d>] ? retint_restore_args+0x5/0x6
> [469390.128009]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
> [469390.128009]  [<ffffffff8119564d>] ? __put_user_4+0x1d/0x30
> [469390.128009]  [<ffffffff81010e0e>] ? int_signal+0x12/0x17
>
> I found another post, which may be the same bug (same kernel, network
> activity ... ) :
>
> http://jira.mongodb.org/browse/SERVER-2383
>
> Any ideas ?

None.. What type of CPU do you have? Are you pinning your
guest to a specific CPU?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel