xen-devel

[Top] [All Lists]

Re: [Xen-devel] kernel oops/IRQ exception when networking between many d

from [Nils Toedtmann]

[Permanent Link][Original]

To:	Birger Tödtmann <btoedtmann@xxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs
From:	Nils Toedtmann <xen-devel@xxxxxxxxxxxxxxxxxx>
Date:	Tue, 07 Jun 2005 18:47:25 +0200
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Tue, 07 Jun 2005 16:46:41 +0000
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1118061010.7357.10.camel@lomin>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<1117904746.7507.31.camel@lomin> <b60a57e1c8d95c01eb0c5b383b9b8e18@xxxxxxxxxxxx> <20050605165716.GA1231@xxxxxxxxxxxxxxxxxxxxx> <49e83a846cc77d6605f4adc2c0f34858@xxxxxxxxxxxx> <1118047945.1972.9.camel@lomin> <ed4c4684b80b36948ce5e4dd7ac938b6@xxxxxxxxxxxx> <1118061010.7357.10.camel@lomin>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Am Montag, den 06.06.2005, 14:30 +0200 schrieb Birger Tödtmann: 
> Am Montag, den 06.06.2005, 10:26 +0100 schrieb Keir Fraser:
> [...]
> > > somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the
> > > crash happens - could this hint to something?
> > 
> > The crashes you see with free_mfn removed will be impossible to debug 
> > -- things are very screwed by that point. Even the crash within 
> > free_mfn might be far removed from the cause of the crash, if it's due 
> > to memory corruption.
> > 
> > It's perhaps worth investigating what critical limit you might be 
> > hitting, and what resource it is that's limited. e.g., can you can 
> > create a few vifs, but connected together by some very large number of 
> > bridges (daisy chained together)? Or can you create a large number of 
> > vifs if they are connected together by just one bridge?
> 
> This is getting really weird - as I found out I'll enounter problems
> with far fewer vifs/bridges that suspected.  I just fired up a network
> with 7 nodes, all with four interfaces each connected to the same four
> bridge interfaces.  The nodes can ping through the network, however
> after a short time, the system (dom0) crashes as well.  This time, it
> dies in net_rx_action() at a slightly different place:
> 
> [...]
>  [<c02b6e15>] kfree_skbmem+0x12/0x29
>  [<c02b6ed1>] __kfree_skb+0xa5/0x13f
>  [<c028c9b3>] net_rx_action+0x23d/0x4df
> [...]
> 
> Funnily, I cannot reproduce this with 5 nodes (domUs) running.  I'm a
> bit unsure where to go from here...  Maybe I should try a different
> machine for further testing.

I can confirm this bug on AMD Athlon using xen-unstable from june 5th
(latest ChangeSet 1.1677). All testing domains run OSPF daemons which
will start talking via multicast to each other as soon as the network
connections are established.

  * 'xm create' 20 domains with 122 vifs (+ vif0.0), but that xen-
    version does not UP the vifs. Everything is fine.

  * Create 51 transfer bridges, connect the some vifs to them (not
    more than two vifs to each) UP all vifs. Now i have lo + eth0
    + veth0 + 123 vif* + 51 br* = 177 devices, all UP. 
    All transfer networks work, OSPF tables grow, everything is fine.

  * Create a 52th bridge. Connect 20 vifs to it but DOWN THEM BEFORE.
    Everything ist fine. 

  * Now UP all the vifs connected to the 52th bridge one after the
    other. More and more multicast traffic shows up. After UPing the
    9th vif, dom0 BOOOOOMs (net_rx_action, too).

Further experiments show that its seems to be the amount of traffic (and
the number of connected vifs?) which triggers the oops: with all OSPF
daemons stopped, i could UP all bridges & vifs. But when i did a flood-
broadcast ping (ping -f -b $broadcastadr) on the 52th bridge (that one
with more that two active ports), dom0 OOPSed again.

I could only reproduce that "too-much-traffic-oops" on bridges
connecting more that 10 vifs.

Would be interesting if that happens with unicast traffic, too. Have no
time left, test more tomorrow.

/nils.


ps: Shall we continue crossporting to devel+users?


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Keir Fraser [Xen-users] Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Birger Tödtmann [Xen-users] Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Birger Tödtmann Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Keir Fraser [Xen-users] Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Birger Tödtmann Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Nils Toedtmann <= Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Nils Toedtmann Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Nils Toedtmann Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Nils Toedtmann

Previous by Date:	Re: [Xen-devel] [patch] (resend) mask out nx bits when calculating pfn/mfn, Scott Parish
Next by Date:	RE: [Xen-devel] [patch] (resend) mask out nx bits when calculatingpfn/mfn, Nakajima, Jun
Previous by Thread:	[Xen-users] Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Birger Tödtmann
Next by Thread:	Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs, Nils Toedtmann
Indexes:	[Date] [Thread] [Top] [All Lists]