WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] kernel oops/IRQ exception when networking between many d

Am Dienstag, den 07.06.2005, 18:47 +0200 schrieb Nils Toedtmann:
> Am Montag, den 06.06.2005, 14:30 +0200 schrieb Birger Tödtmann: 
> > Am Montag, den 06.06.2005, 10:26 +0100 schrieb Keir Fraser:
> > [...]
> > > > somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the
> > > > crash happens - could this hint to something?
> > > 
> > > The crashes you see with free_mfn removed will be impossible to debug 
> > > -- things are very screwed by that point. Even the crash within 
> > > free_mfn might be far removed from the cause of the crash, if it's due 
> > > to memory corruption.
> > > 
> > > It's perhaps worth investigating what critical limit you might be 
> > > hitting, and what resource it is that's limited. e.g., can you can 
> > > create a few vifs, but connected together by some very large number of 
> > > bridges (daisy chained together)? Or can you create a large number of 
> > > vifs if they are connected together by just one bridge?
> > 
> > This is getting really weird - as I found out I'll enounter problems
> > with far fewer vifs/bridges that suspected.  I just fired up a network
> > with 7 nodes, all with four interfaces each connected to the same four
> > bridge interfaces.  The nodes can ping through the network, however
> > after a short time, the system (dom0) crashes as well.  This time, it
> > dies in net_rx_action() at a slightly different place:
> > 
> > [...]
> >  [<c02b6e15>] kfree_skbmem+0x12/0x29
> >  [<c02b6ed1>] __kfree_skb+0xa5/0x13f
> >  [<c028c9b3>] net_rx_action+0x23d/0x4df
> > [...]
> > 
> > Funnily, I cannot reproduce this with 5 nodes (domUs) running.  I'm a
> > bit unsure where to go from here...  Maybe I should try a different
> > machine for further testing.
> 
> I can confirm this bug on AMD Athlon using xen-unstable from june 5th
> (latest ChangeSet 1.1677). 
[...]

errr ... sorry for the dupe.

> Further experiments show that its seems to be the amount of traffic (and
> the number of connected vifs?) which triggers the oops: with all OSPF
> daemons stopped, i could UP all bridges & vifs. But when i did a flood-
> broadcast ping (ping -f -b $broadcastadr) on the 52th bridge (that one
> with more that two active ports), dom0 OOPSed again.
> 
> I could only reproduce that "too-much-traffic-oops" on bridges
> connecting more that 10 vifs.
> 
> Would be interesting if that happens with unicast traffic, too. Have no
> time left, test more tomorrow.

Ok, reproduced the dom0 kernel panic in a simpler situation:

* create some domUs, each having 1 interface in the same subnet
* bridge all the interfaces together (dom0 not having an ip on that
  bridge)
* trigger unicast traffic as much as you want (like unicast flood
  pings): No problem.
* Now trigger some broadcast traffic between the domUs:

    ping -i 0,1 -b 192.168.0.255

  BOOOM.


Instead, you may down all vifs first, start the flood broadcast ping in
the first domU and bring up one vif after the other (wait each time
>15sec until the bridge put the added port in forwarding state). After
bringing up 10-15 vifs, dom0 panics. 

I could _not_ reproduce this with massive unicast traffic. The problem
disappears if i set "net.ipv4.icmp_echo_ignore_broadcasts=1" in all
domains. Maybe the probem rises if to many domUs answer to broadcasts at
the same time (collisions?).

/nils.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel