WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

RE: soft lockup was (Re: [Xen-users] Kernel error)

To: <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: RE: soft lockup was (Re: [Xen-users] Kernel error)
From: "Roger Lucas" <roger@xxxxxxxxxxxxx>
Date: Mon, 4 Sep 2006 09:48:16 +0100
Delivery-date: Mon, 04 Sep 2006 01:51:15 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <20060903134911.41A7012EE4@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aca2gljkdjaQ5026RmmoycpiT/cp2QY3FdJwACenwGA=
This soft-lockup problem seems to occur when I perform a large MySQL query
that takes several seconds to complete on a DomU.  At this point, the soft
lockup message appears and the Xen box seems to stall for about 5-10
seconds.  After that, everything continues normally again.

The box is an Abit-LG81 motherboard (Skt775, ICH7) with an Intel Celeron
2.7GHz processor and 2 GB of RAM.  I am running software RAID-5 across the 4
SATA drives in Dom0 and providing the disks to the DomUs using LVM.  The
basic installation was Kubuntu Dapper Drake 6.06 and I installed the Xen
kernel from the 3.0.2-2 binaries on the Xen site.

A capture of the relevant information from syslog is below.  This is what I
get for most of the errors:

Sep  3 15:48:20 hydra kernel: Pid: 0, comm:              swapper
Sep  3 15:48:20 hydra kernel: EIP: 0061:[hypercall_page+935/4096] CPU: 0
Sep  3 15:48:20 hydra kernel: EIP is at 0xc01013a7
Sep  3 15:48:20 hydra kernel:  EFLAGS: 00000296    Tainted: GF
(2.6.16-xen #1)
Sep  3 15:48:20 hydra kernel: EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX:
00001f8e
Sep  3 15:48:20 hydra kernel: ESI: 00000000 EDI: 00000001 EBP: c03da000 DS:
007b ES: 007b
Sep  3 15:48:20 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 00df6000 CR4:
00000640
Sep  3 15:48:20 hydra kernel:  [xen_idle+83/176] xen_idle+0x53/0xb0
Sep  3 15:48:20 hydra kernel:  [cpu_idle+122/224] cpu_idle+0x7a/0xe0
Sep  3 15:48:20 hydra kernel:  [start_kernel+439/512]
start_kernel+0x1b7/0x200
Sep  3 15:48:20 hydra kernel:  [unknown_bootoption+0/464]
unknown_bootoption+0x0/0x1d0

Sometimes I get a longer trace like the one below.  The exact trace varies a
bit but the starting function is always "notify_remote_via_irq":

Sep  3 16:09:30 hydra kernel: BUG: soft lockup detected on CPU#0!
Sep  3 16:09:30 hydra kernel:
Sep  3 16:09:30 hydra kernel: Pid: 0, comm:              swapper
Sep  3 16:09:30 hydra kernel: EIP: 0061:[hypercall_page+519/4096] CPU: 0
Sep  3 16:09:30 hydra kernel: EIP is at 0xc0101207
Sep  3 16:09:30 hydra kernel:  EFLAGS: 00000202    Tainted: GF
(2.6.16-xen #1)
Sep  3 16:09:30 hydra kernel: EAX: 00000000 EBX: c03dbc98 ECX: c114ed40 EDX:
c03dbef4
Sep  3 16:09:30 hydra kernel: ESI: 00000000 EDI: 00000112 EBP: c0432fc0 DS:
007b ES: 007b
Sep  3 16:09:30 hydra kernel: CR0: 8005003b CR2: b7e2c4b0 CR3: 00df6000 CR4:
00000640
Sep  3 16:09:30 hydra kernel:  [notify_remote_via_irq+41/64]
notify_remote_via_irq+0x29/0x40
Sep  3 16:09:30 hydra kernel:  [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90
Sep  3 16:09:30 hydra kernel:  [net_rx_action+1123/1280]
net_rx_action+0x463/0x500
Sep  3 16:09:30 hydra kernel:  [fib_lookup+209/320] fib_lookup+0xd1/0x140
Sep  3 16:09:30 hydra kernel:  [ip_route_input_slow+440/2528]
ip_route_input_slow+0x1b8/0x9e0
Sep  3 16:09:30 hydra kernel:  [try_to_wake_up+768/880]
try_to_wake_up+0x300/0x370
Sep  3 16:09:30 hydra kernel:  [<e13c5000>] br_forward_finish+0x0/0x70
[bridge]
Sep  3 16:09:30 hydra kernel:  [neigh_lookup+136/208] neigh_lookup+0x88/0xd0
Sep  3 16:09:30 hydra kernel:  [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90
Sep  3 16:09:30 hydra kernel:  [arp_process+142/1456] arp_process+0x8e/0x5b0
Sep  3 16:09:30 hydra kernel:  [ip_local_deliver+280/688]
ip_local_deliver+0x118/0x2b0
Sep  3 16:09:30 hydra kernel:  [arp_rcv+221/400] arp_rcv+0xdd/0x190
Sep  3 16:09:30 hydra kernel:  [packet_rcv_spkt+359/672]
packet_rcv_spkt+0x167/0x2a0
Sep  3 16:09:30 hydra kernel:  [netif_receive_skb+650/816]
netif_receive_skb+0x28a/0x330
Sep  3 16:09:30 hydra kernel:  [process_backlog+215/400]
process_backlog+0xd7/0x190
Sep  3 16:09:30 hydra kernel:  [tasklet_action+157/320]
tasklet_action+0x9d/0x140
Sep  3 16:09:30 hydra kernel:  [__do_softirq+245/288]
__do_softirq+0xf5/0x120
Sep  3 16:09:30 hydra kernel:  [do_softirq+149/160] do_softirq+0x95/0xa0
Sep  3 16:09:30 hydra kernel:  [do_IRQ+31/48] do_IRQ+0x1f/0x30
Sep  3 16:09:30 hydra kernel:  [evtchn_do_upcall+168/240]
evtchn_do_upcall+0xa8/0xf0
Sep  3 16:09:30 hydra kernel:  [hypervisor_callback+44/52]
hypervisor_callback+0x2c/0x34
Sep  3 16:09:30 hydra kernel:  [xen_idle+83/176] xen_idle+0x53/0xb0
Sep  3 16:09:30 hydra kernel:  [cpu_idle+122/224] cpu_idle+0x7a/0xe0
Sep  3 16:09:30 hydra kernel:  [start_kernel+439/512]
start_kernel+0x1b7/0x200

Just once, I got the following error:

Sep  3 18:40:35 hydra kernel: Pid: 2268, comm:            md0_raid5
Sep  3 18:40:35 hydra kernel: EIP: 0061:[hypercall_page+551/4096] CPU: 0
Sep  3 18:40:35 hydra kernel: EIP is at 0xc0101227
Sep  3 18:40:35 hydra kernel:  EFLAGS: 00200246    Tainted: GF
(2.6.16-xen #1)
Sep  3 18:40:35 hydra kernel: EAX: 00030000 EBX: 00000000 ECX: 00000000 EDX:
c0619c2c
Sep  3 18:40:35 hydra kernel: ESI: c0619b30 EDI: c0619b40 EBP: 00000001 DS:
007b ES: 007b
Sep  3 18:40:35 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 003f2000 CR4:
00000640
Sep  3 18:40:35 hydra kernel:  [force_evtchn_callback+10/16]
force_evtchn_callback+0xa/0x10
Sep  3 18:40:35 hydra kernel:  [get_request+727/800] get_request+0x2d7/0x320
Sep  3 18:40:35 hydra kernel:  [lock_timer_base+36/80]
lock_timer_base+0x24/0x50
Sep  3 18:40:35 hydra kernel:  [get_request_wait+44/368]
get_request_wait+0x2c/0x170
Sep  3 18:40:35 hydra kernel:  [blk_plug_device+99/160]
blk_plug_device+0x63/0xa0
Sep  3 18:40:35 hydra kernel:  [kobject_put+31/48] kobject_put+0x1f/0x30
Sep  3 18:40:35 hydra kernel:  [kobject_release+0/16]
kobject_release+0x0/0x10
Sep  3 18:40:35 hydra kernel:  [<e105e3f1>] scsi_request_fn+0x261/0x400
[scsi_mod]
Sep  3 18:40:35 hydra kernel:  [__make_request+170/1184]
__make_request+0xaa/0x4a0
Sep  3 18:40:35 hydra kernel:  [schedule+1013/1840] schedule+0x3f5/0x730
Sep  3 18:40:35 hydra kernel:  [generic_make_request+240/352]
generic_make_request+0xf0/0x160
Sep  3 18:40:35 hydra kernel:  [__bio_clone+166/176] __bio_clone+0xa6/0xb0
Sep  3 18:40:35 hydra kernel:  [submit_bio+98/256] submit_bio+0x62/0x100
Sep  3 18:40:35 hydra kernel:  [<e108f728>] md_super_write+0xa8/0xe0
[md_mod]
Sep  3 18:40:35 hydra kernel:  [<e10919a6>] md_update_sb+0x1b6/0x230
[md_mod]
Sep  3 18:40:35 hydra kernel:  [<e1097793>] md_check_recovery+0x463/0x4d0
[md_mod]
Sep  3 18:40:35 hydra kernel:  [schedule_timeout+169/176]
schedule_timeout+0xa9/0xb0
Sep  3 18:40:35 hydra kernel:  [<e1085bb6>] raid5d+0x16/0x190 [raid5]
Sep  3 18:40:35 hydra kernel:  [prepare_to_wait+32/112]
prepare_to_wait+0x20/0x70
Sep  3 18:40:35 hydra kernel:  [<e109577f>] md_thread+0x5f/0x130 [md_mod]
Sep  3 18:40:35 hydra kernel:  [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Sep  3 18:40:35 hydra kernel:  [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Sep  3 18:40:35 hydra kernel:  [<e1095720>] md_thread+0x0/0x130 [md_mod]
Sep  3 18:40:35 hydra kernel:  [kthread+186/192] kthread+0xba/0xc0
Sep  3 18:40:35 hydra kernel:  [kthread+0/192] kthread+0x0/0xc0
Sep  3 18:40:35 hydra kernel:  [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10

Does anyone on this list know what is going on and why this would occur?

I have a work-around that breaks the SQL query into a set of smaller queries
which don't then cause this problem, but I would like to get to the root
cause and fix the problem properly.

Thanks in advance for any help anyone can give on this.



> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Lucas
> Sent: 03 September 2006 14:47
> To: xen-users@xxxxxxxxxxxxxxxxxxx
> Subject: RE: soft lockup was (Re: [Xen-users] Kernel error)
> 
> I have suddenly got these same errors occurring on my Xen-3.0.2-2 system.
> I
> have four DomUs with 256MB ram each and 512MB on the Dom0 running on an
> Intel Celeron system.
> 
> Is the only solution to upgrade to Unstable, or is there a patch/upgrade
> available for the 3.0.2-2 release?
> 
> Thanks, Roger.
> 
> > -----Original Message-----
> > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Steve Traugott
> > Sent: 02 August 2006 23:22
> > To: Jones, Chris
> > Cc: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx
> > Subject: soft lockup was (Re: [Xen-users] Kernel error)
> >
> > Hi Chris,
> >
> > Did you ever reach any sort of conclusion about the current state of
> > the soft lockup bug?  Do you have a stable build now?  What changeset
> > is it?
> >
> > Thanks,
> >
> > Steve
> >
> > On Fri, Jul 07, 2006 at 07:37:33AM -0500, Jones, Chris wrote:
> > > I am getting the same errors in the stable 3.0.2 but I am not getting
> > > the errors on unstable so it looks like you are right. I am
> downloading
> > > the testing tree in an attempt to test it there. I will holler when I
> > > find something out.
> > >
> > > -----Original Message-----
> > > From: Rodrigo Borges Pereira [mailto:rbp@xxxxxxxxxxxxx]
> > > Sent: Friday, July 07, 2006 7:23 AM
> > > To: Jones, Chris; xen-users@xxxxxxxxxxxxxxxxxxx
> > > Subject: RE: [Xen-users] Kernel error
> > >
> > > I believe that thread states that the fix is already in 3.0.2. And i
> am
> > > running 3.0.2.
> > > Did i get it wrong?
> > >
> > > tks
> > >
> > > > -----Original Message-----
> > > > From: Jones, Chris [mailto:chris.jones@xxxxxxxxxxxxxxx]
> > > > Sent: sexta-feira, 7 de Julho de 2006 13:18
> > > > To: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Subject: RE: [Xen-users] Kernel error
> > > >
> > > > There is a fix for this issue.
> > > > http://lists.xensource.com/archives/html/xen-devel/2006-04/msg
> > > 00193.html
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > > > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > > Rodrigo Borges Pereira
> > > > Sent: Friday, July 07, 2006 7:06 AM
> > > > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Subject: [Xen-users] Kernel error
> > > >
> > > > Hi,
> > > >
> > > > I got this on the console of one DomU:
> > > >
> > > > --> BUG: soft lockup detected on CPU#0!
> > > >
> > > > Pid: 0, comm:              swapper
> > > > EIP: 0061:[<c01013a7>] CPU: 0
> > > > EIP is at 0xc01013a7
> > > >  EFLAGS: 00000246    Tainted: GF      (2.6.16-xen3_86.1_rhel4.1 #1)
> > > > EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00004eaf
> > > > ESI: 00000000 EDI: 00000001 EBP: c03e4000 DS: 007b ES: 007b
> > > > CR0: 8005003b CR2: 8005230c CR3: 004ec000 CR4: 00000640
> > > > [<c0102b53>] xen_idle+0x53/0xb0  [<c0102c1f>]
> > > > cpu_idle+0x6f/0xe0  [<c03e69da>] start_kernel+0x1da/0x230
> > > > [<c03e6320>] unknown_bootoption+0x0/0x1f0
> > > >
> > > >
> > > > It didn't seem to affect the operation of either DomU or Dom0.
> > > >
> > > > Should i worry?
> > > >
> > > > Best regards,
> > > > r
> > > >
> > > >
> > > > _______________________________________________
> > > > Xen-users mailing list
> > > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-users
> > > >
> > >
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> >
> > --
> > Stephen G. Traugott  (KG6HDQ)
> > UNIX/Linux Infrastructure Architect, TerraLuna LLC
> > stevegt@xxxxxxxxxxxxx
> > http://www.stevegt.com -- http://Infrastructures.Org
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>