|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Intermittent xenvif_disconnect() hang on domU destroy
I'm seeing the xenwatch kernel thread hang intermittently when
destroying a domU on recent stable xen 4.5, with Linux 4.4.11 + grsec
dom0.
The domU is created with a virtual network interface connected to a
physical interface (ixgbevf) via an openvswitch virtual switch.
Everything works fine until the domain is destroyed. Once in a while,
a few seconds after the domain goes away, xenwatch hangs in
xenvif_disconnect(), calling kthread_stop() on a dealloc task.
I added a warning to xenvif_dealloc_kthread_should_stop() when
kthread_should_stop() is true and queue->inflight_packets > 0,
printing inflight_packets as well as stats.tx_zerocopy_*. Each time
the hang occurs, inflight_packets == 1 and tx_zerocopy_sent ==
tx_zerocopy_success + tx_zerocopy_fail + 1.
I also added a warning to xenvif_skb_zerocopy_complete() when
queue->task is null. If I manually bring down the physical interface
to which the vif was connected (ifconfig down), this somehow causes
the last in-flight packet to be transmitted, and everything is
unblocked.
The following shows xenwatch hung trying to stop vif44.0-q0-dealloc,
waking up again after I bring down the physical interface net0_52.
[xl destroy]
...
[ 2914.510070] net vif44.0: stopping vif44.0-q0-dealloc task (pid 28045)
[ 2914.510224] xen_netback:xenvif_dealloc_kthread_should_stop:
vif44.0-q0-dealloc task (pid 28045) should_stop=1 inflight_packets=1
tx_zerocopy_sent=209494 tx_zerocopy_success=209492 tx_zerocopy_fail=1
...
[ 2933.561404] device net0_52 left promiscuous mode
[ 2933.564813] device vif44.0 left promiscuous mode
[ 3136.324009] INFO: task xenwatch:29 blocked for more than 120 seconds.
[ 3136.324119] Not tainted 4.4.11-grsec-skyport #66
[ 3136.324181] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 3136.324284] xenwatch D ffffc90005d0baf8 12640 29 2 0x00000000
[ 3136.324411] ffffc90005d0baf8 0000000000000000 ffffffff81b70d60
ffff8804ec08c500
[ 3136.324536] ffff8804ec08dbb8 ffffc900115abec0 ffffc900115abeb8
ffff8804ec08c500
[ 3136.324662] 0000000000000005 ffffc90005d0bb10 ffffffff816d6ff7
7fffffffffffffff
[ 3136.324806] Call Trace:
[ 3136.324866] [<ffffffff816d6ff7>] schedule+0x37/0x80
[ 3136.324932] [<ffffffff816dbdec>] schedule_timeout+0x1bc/0x240
[ 3136.325004] [<ffffffff8100a73c>] ? xen_clocksource_read+0x1c/0x30
[ 3136.325078] [<ffffffff810194e3>] ? sched_clock+0x13/0x20
[ 3136.325153] [<ffffffff8109f72c>] ? local_clock+0x1c/0x20
[ 3136.325228] [<ffffffff810bdd69>] ? mark_held_locks+0x79/0xa0
[ 3136.325298] [<ffffffff816dd077>] ? _raw_spin_unlock_irq+0x27/0x50
[ 3136.325367] [<ffffffff810bdecd>] ? trace_hardirqs_on_caller+0x13d/0x1d0
[ 3136.325441] [<ffffffff816d8426>] wait_for_completion+0xd6/0x110
[ 3136.325514] [<ffffffff81099570>] ? wake_up_q+0x70/0x70
[ 3136.325585] [<ffffffff8108f087>] kthread_stop+0x47/0x80
[ 3136.325660] [<ffffffff814f1661>] xenvif_disconnect+0xb1/0x130
[ 3136.325729] [<ffffffff814ef3c6>] set_backend_state+0x116/0xde0
[ 3136.325805] [<ffffffff8143717e>] ? xenbus_gather+0x10e/0x140
[ 3136.325881] [<ffffffff811a5e42>] ? kfree+0x1c2/0x1e0
[ 3136.325960] [<ffffffff8109f72c>] ? local_clock+0x1c/0x20
[ 3136.326026] [<ffffffff814f0577>] frontend_changed+0xb7/0xc0
[ 3136.326095] [<ffffffff81437fb0>] xenbus_otherend_changed+0x80/0x90
[ 3136.330341] [<ffffffff81437410>] ? unregister_xenbus_watch+0x260/0x260
[ 3136.330414] [<ffffffff81438d2b>] frontend_changed+0xb/0x10
[ 3136.330483] [<ffffffff8143744a>] xenwatch_thread+0x3a/0x130
[ 3136.330553] [<ffffffff810b2ba0>] ? wake_up_atomic_t+0x30/0x30
[ 3136.330621] [<ffffffff8108eccc>] kthread+0xfc/0x120
[ 3136.330686] [<ffffffff8108ebd0>] ? kthread_create_on_node+0x240/0x240
[ 3136.330775] [<ffffffff816ddbee>] ret_from_fork+0x3e/0x70
[ 3136.330840] [<ffffffff8108ebd0>] ? kthread_create_on_node+0x240/0x240
[ 3136.330911] 1 lock held by xenwatch/29:
[ 3136.330972] #0: (xenwatch_mutex){+.+.+.}, at: [<ffffffff814374a7>]
ffffffff814374a7
...
[ifconfig net0_52 down]
...
[ 3162.217907] ixgbe 0000:81:00.0 net0: VF Reset msg received from vf 52
[ 3162.228840] ------------[ cut here ]------------
[ 3162.228945] WARNING: CPU: 3 PID: 31978 at
drivers/net/xen-netback/interface.c:71 xenvif_skb_zerocopy_complete+0x79/0x90()
[ 3162.229104] vif44.0: dead queue vif44.0-q0 decremented inflight_packets to 0
[ 3162.229184] Modules linked in: xt_physdev br_netfilter bridge stp llc tun
xen_blkback ip_gre ip_tunnel gre ixgbevf drbg ansi_cprng dm_crypt
algif_skcipher af_alg xen_evtchn xenfs xen_privcmd xen_pciback openvswitch
nf_defrag_ipv6 libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache
sunrpc nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables
iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp
crct10dif_pclmul crc32_pclmul raid1 aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper cryptd sb_edac lpc_ich mfd_core mei_me mei i2c_i801
ioatdma ipmi_ssif ipmi_msghandler squashfs lz4_decompress ixgbe mdio vxlan
xhci_pci xhci_hcd ip6_udp_tunnel udp_tunnel ptp pps_core dca ahci libahci
ehci_pci ehci_hcd usbcore usb_common tpm_tis
[ 3162.230580] CPU: 3 PID: 31978 Comm: ifconfig Not tainted
4.4.11-grsec-skyport #66
[ 3162.230681] Hardware name: ABCD, BIOS SE5C610.86B.01.01.0009.C1.060120151350
06/01/2015
[ 3162.230814] 0000000000000000 ffffc9000f7c3a68 ffffffff8137d3f8
0000000000000000
[ 3162.230938] ffffc9000f7c3ab0 ffffffff81aa9060 ffffc9000f7c3aa0
ffffffff81068de8
[ 3162.231062] ffffc90011565000 ffffc9001156f788 0000000000000001
ffffc90011565000
[ 3162.231187] Call Trace:
[ 3162.231251] [<ffffffff8137d3f8>] dump_stack+0x9a/0xe2
[ 3162.231324] [<ffffffff81068de8>] warn_slowpath_common+0x78/0xb0
[ 3162.231392] [<ffffffff81068e67>] warn_slowpath_fmt+0x47/0x50
[ 3162.231469] [<ffffffff810bdecd>] ? trace_hardirqs_on_caller+0x13d/0x1d0
[ 3162.231540] [<ffffffff814f0dc9>] xenvif_skb_zerocopy_complete+0x79/0x90
[ 3162.231612] [<ffffffff814ede0f>] xenvif_zerocopy_callback+0x9f/0xc0
[ 3162.231694] [<ffffffff8156cdb4>] skb_release_data+0xc4/0xe0
[ 3162.231761] [<ffffffff8156cdef>] skb_release_all+0x1f/0x30
[ 3162.231828] [<ffffffff8156cecd>] consume_skb+0x1d/0x40
[ 3162.231900] [<ffffffff81586a74>] __dev_kfree_skb_any+0x34/0x40
[ 3162.231974] [<ffffffffa0bd00d0>]
ixgbevf_unmap_and_free_tx_resource.isra.46+0x20/0x80 [ixgbevf]
[ 3162.232083] [<ffffffffa0bd016c>] ixgbevf_clean_tx_ring+0x3c/0x80 [ixgbevf]
[ 3162.232156] [<ffffffffa0bd418e>] ixgbevf_down+0x2be/0x330 [ixgbevf]
[ 3162.232226] [<ffffffffa0bd5332>] ixgbevf_close+0x22/0xa0 [ixgbevf]
[ 3162.232299] [<ffffffff81581240>] __dev_close_many+0x90/0xe0
[ 3162.232378] [<ffffffff815813be>] __dev_close+0x2e/0x50
[ 3162.232448] [<ffffffff8158cee8>] __dev_change_flags+0x98/0x160
[ 3162.232517] [<ffffffff8158cfd4>] dev_change_flags+0x24/0x60
[ 3162.232590] [<ffffffff8161f534>] devinet_ioctl+0x834/0x8f0
[ 3162.232661] [<ffffffff8162077b>] inet_ioctl+0x4b/0x70
[ 3162.232739] [<ffffffff8155e6c0>] sock_do_ioctl+0x20/0x50
[ 3162.232804] [<ffffffff8155e8d0>] sock_ioctl+0x1e0/0x290
[ 3162.232876] [<ffffffff811da630>] do_vfs_ioctl+0x430/0x7f0
[ 3162.232939] [<ffffffff811daa64>] SyS_ioctl+0x74/0x80
[ 3162.233009] [<ffffffff816dd83a>] entry_SYSCALL_64_fastpath+0x16/0x7e
[ 3162.233129] ---[ end trace 2e4a237dee6f3318 ]---
[ 3162.357009] xen:events: domain 45 does not have 103 anymore
[ 3162.357099] xen:events: domain 45 does not have 102 anymore
...
[xenwatch unblocked]
...
[ 3162.394227] net vif50.0: stopping vif50.0-q0-dealloc task (pid 29413)
[ 3162.667880] net vif42.0: stopping vif42.0-q0-dealloc task (pid 27272)
[ 3162.705464] net vif48.0: stopping vif48.0-q0-dealloc task (pid 28779)
It's not clear to me whether the problem lies in netback, ixgbevf, or
somewhere in between. Is the root cause ixgbevf hanging onto a skb for
so long, and doing nothing with it until I bring the interface down,
or is that a symptom of some other problem? Or is netback supposed to
somehow flush in-flight transmit packets before it gets as far as
xenvif_disconnect()? Or should it forget about the in-flight packets
since the interface is disappearing anyway?
Any clues would be appreciated.
--Ed
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |