[Xen-users] FC6/Xen crash -- isolated to rsnapshot job

I've isolated my FC6/xen crashing problems to cron backup jobs that were
running in my dom0. I have moved the jobs into a domU and now observe the
crashing behavior in the domU. At least the entire environment doesn't
come down when it's in a domU.

In my crontab, I have a series of rsnapshot backup jobs to backup a
handful of windows and linux servers. For windows machines, the script
mounts a share on the windows machine using CIFS (samba). It seems only
the Windows backup jobs crash the machine and then only crash when two or
scheduled to start at exactly the same time.

I can replicate the problem by running the crontab command from the
command line. If I run the commands one at a time, no crash. If I start
them both back to back, the crash occurs within 30 seconds or so.

Under FC4, these scripts/backup jobs worked fine for almost a year without
intervention. I've read there have been a host of problems with CIFS in
FC6, but I thought they had been resolved. As a workaround, I can change
the job schedule for now, but something is still broken in the kernel,
samba or both.

Here's a trace:

list_del corruption. prev->next should be c2f5c640, but was c2f50080
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:65!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /block/ram0/range
Modules linked in: nls_utf8 cifs ipv6 autofs4 hidp l2cap bluetooth
iptable_raw xt_policy xt_multiport ipt_ULOG ipt_TTL ipt_ttl ipt_TOS
ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner
ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_hashlimit ipt_ECN
ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype ip_nat_tftp ip_nat_snmp_basic
ip_nat_pptp ip_nat_irc ip_nat_ftp ip_nat_amanda ip_conntrack_tftp
ip_conntrack_pptp ip_conntrack_netbios_ns ip_conntrack_irc
ip_conntrack_ftp ts_kmp ip_conntrack_amanda xt_tcpmss xt_pkttype
xt_physdev bridge xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length
xt_helper xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY
xt_tcpudp xt_state iptable_nat ip_nat ip_conntrack iptable_mangle
nfnetlink iptable_filter ip_tables x_tables tun sunrpc xennet parport_pc
lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod raid456 xor ext3
jbd xenblk
CPU:    0
EIP:    0061:[<c04e9d0b>]    Not tainted VLI
EFLAGS: 00010082   (2.6.19-1.2895.fc6xen #1)
EIP is at list_del+0x23/0x6c
eax: 00000048   ebx: c2f5c640   ecx: c0683b30   edx: f5416000
esi: c117a7c0   edi: c32af000   ebp: c117eda0   esp: c0d2def0
ds: 007b   es: 007b   ss: 0069
Process events/0 (pid: 5, ti=c0d2d000 task=c006e030 task.ti=c0d2d000)
Stack: c0646145 c2f5c640 c2f50080 c2f5c640 c0467706 c078afc0 c028c980
c0619b9d
       00000014 00000002 c1176228 c1176220 00000014 c1176200 00000000
c0467809
       00000000 00000000 c117eda0 c117a7e4 c117a7c0 c117eda0 c0d404a0
00000000
Call Trace:
 [<c0467706>] free_block+0x77/0xf0
 [<c0467809>] drain_array+0x8a/0xb5
 [<c0468e22>] cache_reap+0x85/0x117
 [<c042d603>] run_workqueue+0x97/0xdd
 [<c042dfc0>] worker_thread+0xd9/0x10d
 [<c043058c>] kthread+0xc0/0xec
 [<c0405253>] kernel_thread_helper+0x7/0x10
 =======================
Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39 d8 74 1c
89 5c 24 04 89 44 24 08 c7 04 24 45 61 64 c0 e8 9a 4b f3 ff <0f> 0b 41 00
82 61 64 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04
EIP: [<c04e9d0b>] list_del+0x23/0x6c SS:ESP 0069:c0d2def0
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c04056ff>] dump_trace+0x69/0x1b6
 [<c0405864>] show_trace_log_lvl+0x18/0x2c
 [<c0405e4b>] show_trace+0xf/0x11
 [<c0405e7a>] dump_stack+0x15/0x17
 [<c0433252>] down_read+0x12/0x28
 [<c042aca2>] blocking_notifier_call_chain+0xe/0x29
 [<c0420d75>] do_exit+0x1b/0x787
 [<c0405dec>] die+0x2af/0x2d4
 [<c0406262>] do_invalid_op+0xa2/0xab
 [<c0619deb>] error_code+0x2b/0x30
 [<c04e9d0b>] list_del+0x23/0x6c
 [<c0467706>] free_block+0x77/0xf0
 [<c0467809>] drain_array+0x8a/0xb5
 [<c0468e22>] cache_reap+0x85/0x117
 [<c042d603>] run_workqueue+0x97/0xdd
 [<c042dfc0>] worker_thread+0xd9/0x10d
 [<c043058c>] kthread+0xc0/0xec
 [<c0405253>] kernel_thread_helper+0x7/0x10
 =======================
BUG: spinlock lockup on CPU#0, rsync/11148, c117a7e4 (Not tainted)
 [<c04056ff>] dump_trace+0x69/0x1b6
 [<c0405864>] show_trace_log_lvl+0x18/0x2c
 [<c0405e4b>] show_trace+0xf/0x11
 [<c0405e7a>] dump_stack+0x15/0x17
 [<c04e9b6f>] _raw_spin_lock+0xbf/0xdc
 [<c0467a45>] cache_alloc_refill+0x74/0x4dc
 [<c04679b8>] kmem_cache_alloc+0x54/0x6d
 [<c0413ec1>] pgd_alloc+0x54/0x230
 [<c041c020>] mm_init+0x94/0xb9
 [<c047056d>] do_execve+0x6f/0x1f5
 [<c0402e08>] sys_execve+0x2f/0x4f
 [<c0404efb>] syscall_call+0x7/0xb
 [<00b98402>] 0xb98402
 =======================




_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
WARNING - OLD ARCHIVES

xen-users

[Xen-users] FC6/Xen crash -- isolated to rsnapshot job