WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-bugs

[Xen-bugs] [Bug 1486] New: dom0 crashes under heavy network load

To: xen-bugs@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-bugs] [Bug 1486] New: dom0 crashes under heavy network load
From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
Date: Tue, 14 Jul 2009 01:08:16 -0700
Delivery-date: Tue, 14 Jul 2009 01:08:25 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-bugs-request@lists.xensource.com?subject=help>
List-id: Xen Bugzilla <xen-bugs.lists.xensource.com>
List-post: <mailto:xen-bugs@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-bugs>, <mailto:xen-bugs-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-bugs>, <mailto:xen-bugs-request@lists.xensource.com?subject=unsubscribe>
Reply-to: bugs@xxxxxxxxxxxxxxxxxx
Sender: xen-bugs-bounces@xxxxxxxxxxxxxxxxxxx
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1486

           Summary: dom0 crashes under heavy network load
           Product: Xen
           Version: unstable
          Platform: x86-64
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: Hypervisor
        AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
        ReportedBy: uk@xxxxxxxxxxxxx
                CC: uk@xxxxxxxxxxxxx


On a Dell PE-R710, with bnx2 network drivers (also tested with e1000 card, wich
also crashes if onboard-bnx2 is disabled, so I think this is not a nic driver
issue), dom0 crashes totally under heavy constant network and disk load
(produced in dom0 and one domU). faster reproduceable with an additional rsync
which also causes disk i/o.
In my testing scenario, 60 domU have been started, each of them had 6 disk- and
2 network blockdevices, so 8 backend-devices in use.

Testing scenario, using netcat to produce constant load (only zero bytes in
this case):
my.dom0 #: nc -l -p 1234 | pv > /dev/null
external.host #: cat /dev/zero | pv | nc ip.of.my.dom0 1234

then i ran additional rsync in order to produce net and disk i/o:
my.dom0 #:
for i in $(seq 1 1000); do echo "============== run $i ============" >>
rsync-runs.txt ; rm -rfv /var/spool/test/* ; rsync -avP --numeric-ids
--password-file=/etc/rsyncd.secrets user@xxxxxxxxxxxxx::source/*
/var/spool/test/; done
...which copies round about 1G of data in one run.

The Crash occurs in a few minutes or even several ours; testing the e1000 it
took 84 rsync runs (I do not know how long it took as it crashed last night).
I think I can crash the machine faster if I use the bnx2 card.

Here, the unstable kernel 2.6.27.5 from xenbits was used, but this issue also
affects older versions.

Stacktrace:
9 19:34:20 xh132 kernel: ------------[ cut here ]------------
Jul  9 19:34:20 xh132 kernel: WARNING: at net/sched/sch_generic.c:219
dev_watchdog+0x13c/0x1e9()
Jul  9 19:34:20 xh132 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit timed out
Jul  9 19:34:20 xh132 kernel: Modules linked in: iptable_filter(N) ip_tables(N)
x_tables(N) bridge(N) stp(N) llc(N) loop(N) dm_mod(N) 8021q(N) bonding(N)
dcdbas(N)
Jul  9 19:34:20 xh132 kernel: Supported: No
Jul  9 19:34:20 xh132 kernel: Pid: 0, comm: swapper Tainted: G         
2.6.27.5-xen0-he+4 #7
Jul  9 19:34:20 xh132 kernel:
Jul  9 19:34:20 xh132 kernel: Call Trace:
Jul  9 19:34:20 xh132 kernel: <IRQ>  [<ffffffff8022b3d7>]
warn_slowpath+0xb4/0xde
Jul  9 19:34:20 xh132 kernel: [<ffffffff80552b00>] __down_read+0xb6/0x110
Jul  9 19:34:20 xh132 kernel: [<ffffffff804d6999>] neigh_lookup+0xb0/0xc0
Jul  9 19:34:20 xh132 kernel: [<ffffffff804cafd2>] skb_queue_tail+0x17/0x3e
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020d6de>] get_nsec_offset+0x9/0x2c
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020d7ff>] local_clock+0x48/0x99
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020d6de>] get_nsec_offset+0x9/0x2c
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020d7ff>] local_clock+0x48/0x99
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020d96f>] sched_clock+0x15/0x36
Jul  9 19:34:20 xh132 kernel: [<ffffffff80241ef5>] sched_clock_cpu+0x290/0x2b9
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020dfea>] timer_interrupt+0x409/0x41d
Jul  9 19:34:20 xh132 kernel: [<ffffffff804ded1f>] dev_watchdog+0x13c/0x1e9
Jul  9 19:34:20 xh132 kernel: [<ffffffffa0038b31>] br_fdb_cleanup+0x0/0xd5
[bridge]
Jul  9 19:34:20 xh132 kernel: [<ffffffff802347c8>] __mod_timer+0xc7/0xd5
Jul  9 19:34:20 xh132 kernel: [<ffffffff804debe3>] dev_watchdog+0x0/0x1e9
Jul  9 19:34:20 xh132 kernel: [<ffffffff80234131>]
run_timer_softirq+0x16c/0x211
Jul  9 19:34:20 xh132 kernel: [<ffffffff8024f132>] handle_percpu_irq+0x53/0x6f
Jul  9 19:34:20 xh132 kernel: [<ffffffff8022fee0>] __do_softirq+0x92/0x13b
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020b37c>] call_softirq+0x1c/0x28
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020d1c3>] do_softirq+0x55/0xbb
Jul  9 19:34:20 xh132 kernel: [<ffffffff8020ae3e>]
do_hypervisor_callback+0x1e/0x30
Jul  9 19:34:20 xh132 kernel: <EOI>  [<ffffffff8020d6af>]
xen_safe_halt+0xb3/0xd9
Jul  9 19:34:20 xh132 kernel: [<ffffffff802105b3>] xen_idle+0x2e/0x67
Jul  9 19:34:20 xh132 kernel: [<ffffffff80208dfe>] cpu_idle+0x57/0x75
Jul  9 19:34:20 xh132 kernel:
Jul  9 19:34:20 xh132 kernel: ---[ end trace a04b8dccc5213f7d ]---
Jul  9 19:34:20 xh132 kernel: bnx2: eth0 NIC Copper Link is Down
Jul  9 19:34:20 xh132 kernel: bonding: bond0: link status down for active
interface eth0, disabling it in 200 ms.
Jul  9 19:34:20 xh132 kernel: bonding: bond0: link status definitely down for
interface eth0, disabling it
Jul  9 19:34:20 xh132 kernel: device eth0 left promiscuous mode
Jul  9 19:34:20 xh132 kernel: bonding: bond0: now running without any active
interface !

Please let me know if you need further information.
So perhaps you can help.

Many thanks in advance, 
best regards,
Ulf Kreutzberg


-- 
Configure bugmail: 
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs

<Prev in Thread] Current Thread [Next in Thread>