WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Nasty kernel panic

A couple people have pointed at the e1000 driver as a possible culprit
and given good reasons why that should be the case..my only question
is why did I also get the same kernel panic on the new poweredge 2950
which doesn't have intel e1000 but broadcomm drivers and nics?

By the way, all the systems in question have now been up for 18 hours
and functioning fine so once we got first the rsyncing done, and
then the squid servers all re-initialized correctly, we have been
OK since then.

I am away from the office but I will follow up the thread and post
the kernel config later.

Steve Timm


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Fri, 29 Aug 2008, Tim Post wrote:

Hi Steve,

On Thu, 2008-08-28 at 16:52 -0500, Steven Timm wrote:
I have seen the following kernel panic 5 times today on
three different machines, two of which had been stable
for months and one of which is a brand new install.

[snip]

<Aug/28 12:21 pm> [<ffffffff88107a79>]
:e1000:e1000_clean_rx_irq+0x430/0x4d5
<Aug/28 12:21 pm> [<ffffffff881074ec>] :e1000:e1000_clean+0x82/0x160
<Aug/28 12:21 pm> [<ffffffff80395f51>] net_rx_action+0xe7/0x254
<Aug/28 12:21 pm> [<ffffffff80233d97>] __do_softirq+0x7b/0x10d
<Aug/28 12:21 pm> [<ffffffff8020b094>] call_softirq+0x1c/0x28
<Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
<Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
<Aug/28 12:21 pm> [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165
<Aug/28 12:21 pm> [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c
<Aug/28 12:21 pm> <EOI>

<Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89 44 24 0c
0f 84
36
<Aug/28 12:21 pm>RIP  [<ffffffff88256375>] :ipv6:rt6_select+0x38/0x1f4
<Aug/28 12:21 pm> RSP <ffffffff80526b00>
<Aug/28 12:21 pm>CR2: 00000000000000f4
<Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee, killing interrupt
handler

It looks like e1000 might be being spit out. From what I gather in your
message, the only thing that changed was you are now putting a much
higher I/O demand on the drives (rsyncing everything), by extension this
increases the demand on the NIC.

If the e1000 nic is the one enslaved to the bridge, it could be clean up
that's making it freak when a guest stops. If its ejected uncleanly, the
PID next in line with pending i/o for the device will likely be
identified as the culprit.

I had a very similar problem with a buggy Areca driver on dom-0 a couple
of years ago.

Can you post a link to your kernel's .config, or perhaps try the latest
stable version of that module from:

http://sourceforge.net/project/showfiles.php?group_id=42302

As for ipv6, if its being set up you'll see it in /etc/sysconfig
or /etc/network (depending on the distro) pretty clearly. However, that
shouldn't make a difference .. it should work either way.

Hope this helps :)


Cheers!
--Tim

--
Monkey + Typewriter = Echoreply ( http://echoreply.us )



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users