This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] kernel oops/IRQ exception when networking between many d

Am Montag, den 06.06.2005, 14:30 +0200 schrieb Birger Tödtmann: 
> Am Montag, den 06.06.2005, 10:26 +0100 schrieb Keir Fraser:
> [...]
> > > somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the
> > > crash happens - could this hint to something?
> > 
> > The crashes you see with free_mfn removed will be impossible to debug 
> > -- things are very screwed by that point. Even the crash within 
> > free_mfn might be far removed from the cause of the crash, if it's due 
> > to memory corruption.
> > 
> > It's perhaps worth investigating what critical limit you might be 
> > hitting, and what resource it is that's limited. e.g., can you can 
> > create a few vifs, but connected together by some very large number of 
> > bridges (daisy chained together)? Or can you create a large number of 
> > vifs if they are connected together by just one bridge?
> This is getting really weird - as I found out I'll enounter problems
> with far fewer vifs/bridges that suspected.  I just fired up a network
> with 7 nodes, all with four interfaces each connected to the same four
> bridge interfaces.  The nodes can ping through the network, however
> after a short time, the system (dom0) crashes as well.  This time, it
> dies in net_rx_action() at a slightly different place:
> [...]
>  [<c02b6e15>] kfree_skbmem+0x12/0x29
>  [<c02b6ed1>] __kfree_skb+0xa5/0x13f
>  [<c028c9b3>] net_rx_action+0x23d/0x4df
> [...]
> Funnily, I cannot reproduce this with 5 nodes (domUs) running.  I'm a
> bit unsure where to go from here...  Maybe I should try a different
> machine for further testing.

I can confirm this bug on AMD Athlon using xen-unstable from june 5th
(latest ChangeSet 1.1677). All testing domains run OSPF daemons which
will start talking via multicast to each other as soon as the network
connections are established.

  * 'xm create' 20 domains with 122 vifs (+ vif0.0), but that xen-
    version does not UP the vifs. Everything is fine.

  * Create 51 transfer bridges, connect the some vifs to them (not
    more than two vifs to each) UP all vifs. Now i have lo + eth0
    + veth0 + 123 vif* + 51 br* = 177 devices, all UP. 
    All transfer networks work, OSPF tables grow, everything is fine.

  * Create a 52th bridge. Connect 20 vifs to it but DOWN THEM BEFORE.
    Everything ist fine. 

  * Now UP all the vifs connected to the 52th bridge one after the
    other. More and more multicast traffic shows up. After UPing the
    9th vif, dom0 BOOOOOMs (net_rx_action, too).

Further experiments show that its seems to be the amount of traffic (and
the number of connected vifs?) which triggers the oops: with all OSPF
daemons stopped, i could UP all bridges & vifs. But when i did a flood-
broadcast ping (ping -f -b $broadcastadr) on the 52th bridge (that one
with more that two active ports), dom0 OOPSed again.

I could only reproduce that "too-much-traffic-oops" on bridges
connecting more that 10 vifs.

Would be interesting if that happens with unicast traffic, too. Have no
time left, test more tomorrow.


ps: Shall we continue crossporting to devel+users?

Xen-devel mailing list