WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Network dies and kernel errors

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] Network dies and kernel errors
From: John McMonagle <johnm@xxxxxxxxxxx>
Date: Fri, 29 Jul 2011 16:12:57 -0500
Cc: tinnycloud@xxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 29 Jul 2011 14:14:24 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110729173129.GA7637@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Advocap Inc
References: <201107251418.21569.johnm@xxxxxxxxxxx> <201107291038.21289.johnm@xxxxxxxxxxx> <20110729173129.GA7637@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.13.5 (Linux/2.6.32-5-amd64; KDE/4.4.5; x86_64; ; )
On Friday, July 29, 2011 12:31:29 pm you wrote:
> > > Did you try that? Did that make any difference?
> > 
> > Not tested I did install one.
> > 
> > I think I found a way to keep it running.
> > On the new igb driver I built from new intel source added module
> > parameter IntMode=1.
> > 
> > This puts it in msi mode. It was in msi-x mode.
> > It's never died with that setting.
> > It's up now over a day.
> > No real experience with msi-x. I think it's the first time I have seen a
> > driver use msi-x interrupts.
> > Maybe that gives you more ideas?
> 
> That was my thought - the MSI-X aren't somehow being ACKed properly. But I
> don't know if the issue with Dom0 or Xen.
> 
> > > > Any ideas?
> > > 
> > > There is a Xen parameter called 'noirqbalance' . Try that. Also see if
> > > you can limit the CPUs in the dom0 using these two arguments on Xen
> > 
> > > hypervisor:
> > Should I turn off the irqbalence daemon also?
> 
> Sure.
> 
> > Just in case you wonder it does with out it.
> > 
> > > dom0_vcpus=2 dom0_vcpus_pin=1
> > > 
> > > 
> > > It would be interesting to narrow down _when_ you trigger this failure.
> > > B/c we can pull Xen to see what the MSI's are 'xl debug-keys M'
> > > _before_ and _after_ your failure to see if something is amiss.
> > > 
> > > Mainly to figure out if the vectors are moving around the CPUs (or not)
> > > 
> > > (XEN)  MSI    29 vec=21 lowest  edge   assert  log lowest dest=00000001
> > > mask=0/0/-1
> > > 
> > > and also 'xl debug-keys i' to see if the domain has ACK-ed the
> > > interrupt: (XEN)    IRQ:  29
> > > affinity:00000000,00000000,00000000,00000001 vec:21 type=PCI-MSI      
> > >   status=00000010 in-flight=0 domain-list=0:275(----),
> > > 
> > > (the last '----' might have something else in in them - if so that is a
> > > sign that dom0 hasn't picked up the event/vector).
> > 
> > Much of my frustration is that I have not found a way to get it to fail
> > other than waiting a long time :-(
> 
> Ah that sucks. Well, just make a nice shell script that will run those
> continously (and also 'xl dmesg') and pipe the log to a file.
I was just setting up to run your test.
About 10 minutes after removing irqbalence I lost networking.
I'm remote and few minutes later lost ipmi sol so odds I lost the serial port 
interrupt also. So went to ikvm.

Was able to able to restore network with
ifdown xenbr0
rmmod igb
modprobe igb
ifup xenbr0

The runnning domu still had no networking.

Attached is
dom0.dmesg.gz  dom0 dmesg I see nothing in particular myself.
xen.dmesg.gz    xl dmesg  I had done the M and i earlier and ran them just 
before running xl dmesg so it should have before and after.
Did these before bringing the network back up.

I'll reboot and run as you requested.

John

Attachment: xen.dmesg.gz
Description: GNU Zip compressed data

Attachment: dom0.dmesg.gz
Description: GNU Zip compressed data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>