|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.
On 2017-10-05 06:20 AM, Konrad Rzeszutek Wilk wrote: On Wed, Oct 04, 2017 at 08:26:27PM +0200, Philipp Hahn wrote:Hello,Adding Ankur to this as I think he saw something similar. But in the meantime - do you see this with the latest version of Linux?with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed several stuck processes: Here is one (truncated) dump of the Linux kernel messages:[<ffffffff8160db6d>] ? __schedule+0x23d/0x6d0 [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90 [<ffffffff8160e032>] ? schedule+0x32/0x80 [<ffffffff8161150c>] ? schedule_timeout+0x1ec/0x360 [<ffffffff8130f277>] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below [<ffffffff8101b7f1>] ? xen_clocksource_get_cycles+0x11/0x20 [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90 [<ffffffff8160d8b4>] ? io_schedule_timeout+0xb4/0x130 [<ffffffff810bb6f7>] ? prepare_to_wait+0x57/0x80 [<ffffffff8160e8b7>] ? bit_wait_io+0x17/0x60 [<ffffffff8160e3ac>] ? __wait_on_bit+0x5c/0x90 [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90 [<ffffffff8160e50e>] ? out_of_line_wait_on_bit+0x7e/0xa0 [<ffffffff810bba20>] ? autoremove_wake_function+0x40/0x40 [<ffffffffc00abd18>] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2] [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720 [<ffffffff810e67fd>] ? try_to_del_timer_sync+0x4d/0x80 [<ffffffffc00b099d>] ? kjournald2+0xdd/0x280 [jbd2] [<ffffffff810bb9e0>] ? wake_up_atomic_t+0x30/0x30 [<ffffffffc00b08c0>] ? commit_timeout+0x10/0x10 [jbd2] [<ffffffff81097660>] ? kthread+0xf0/0x110 [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720 [<ffffffff81097570>] ? kthread_park+0x60/0x60 [<ffffffff81612bb5>] ? ret_from_fork+0x25/0x30 This looks like this race: https://patchwork.kernel.org/patch/9853443/ Can you dump the output of: cat /sys/block/$xen-frontend-device/mq/*/tags If you've hit this bug, one or more of the MQs would be wedged and the nr_free in one or more of the queues would be 0 and will not change. Ankur NMI backtrace for cpu 2 CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 4.9.30-2A~4.2.0.201709271649 0000000000000000 ffffffff81331935 0000000000000000 0000000000000002 ffffffff81335e60 0000000000000002 ffffffff8104cb70 ffff8801f0c90e80 ffffffff81335f6a ffff8801f0c90e80 00000000003fffbc ffffffff81128048 Call Trace: [<ffffffff81331935>] ? dump_stack+0x5c/0x77 [<ffffffff81335e60>] ? nmi_cpu_backtrace+0x90/0xa0 [<ffffffff8104cb70>] ? irq_force_complete_move+0x140/0x140 [<ffffffff81335f6a>] ? nmi_trigger_cpumask_backtrace+0xfa/0x130 [<ffffffff81128048>] ? watchdog+0x2b8/0x330 [<ffffffff81127d90>] ? reset_hung_task_detector+0x10/0x10 [<ffffffff81097660>] ? kthread+0xf0/0x110 [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720 [<ffffffff81097570>] ? kthread_park+0x60/0x60 [<ffffffff81612bb5>] ? ret_from_fork+0x25/0x30 Sending NMI from CPU 2 to CPUs 0-1,3: NMI backtrace for cpu 1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian 4.9.30-2A~4.2.0.201709271649 task: ffff8801f4a02ec0 task.stack: ffffc90040ca4000 RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] xen_hypercall_sched_op+0xa/0x20 RSP: e02b:ffffc90040ca7ed0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff8801f4a02ec0 RCX: ffffffff810013aa RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8801f4a02ec0 R15: ffff8801f4a02ec0 FS: 00007f23ac595700(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00007f52537d6d46 CR3: 00000001bba23000 CR4: 0000000000002660 Stack: ffff8801bb832201 0000000000000001 ffffffff8101b55c ffffffff81611ec8 ffff8801f4a02ec0 0000000000000001 ffffffff810bc280 ffff8801f4a02ec0 ffff8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 0000000000000000 Call Trace: [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20 [<ffffffff81611ec8>] ? default_idle+0x18/0xd0 [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260 Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc NMI backtrace for cpu 3 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian 4.9.30-2A~4.2.0.201709271649 task: ffff8801f4a24f00 task.stack: ffffc90040cb4000 RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] xen_hypercall_sched_op+0xa/0x20 RSP: e02b:ffffc90040cb7ed0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff8801f4a24f00 RCX: ffffffff810013aa RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8801f4a24f00 R15: ffff8801f4a24f00 FS: 00007f1a2af19700(0000) GS:ffff8801f5b80000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00007f4a5416b000 CR3: 00000001d83ec000 CR4: 0000000000002660 Stack: 0000000000000001 0000000000000001 ffffffff8101b55c ffffffff81611ec8 ffff8801f4a24f00 0000000000000003 ffffffff810bc280 ffff8801f4a24f00 ffff8801f4a24f00 77816deb133b9979 addce6dcadd009c9 0000000000000000 Call Trace: [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20 [<ffffffff81611ec8>] ? default_idle+0x18/0xd0 [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260 Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-ucs105-amd64 #1 Debian 4.9.30-2A~4.2.0.201709271649 task: ffffffff81c0e540 task.stack: ffffffff81c00000 RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] xen_hypercall_sched_op+0xa/0x20 RSP: e02b:ffffffff81c03e90 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff81c0e540 RCX: ffffffff810013aa RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: ffffffff81c0e540 R15: ffffffff81c0e540 FS: 00007f977d3c2700(0000) GS:ffff8801f5a00000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055ae9becabc8 CR3: 00000001d6a40000 CR4: 0000000000002660 Stack: 0000000000000001 0000000000000001 ffffffff8101b55c ffffffff81611ec8 ffffffff81c0e540 0000000000000000 ffffffff810bc280 ffffffff81c0e540 ffffffff81c0e540 ebeac214e330bd3f addce6dcadd009c9 ffffffffffffffff Call Trace: [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20 [<ffffffff81611ec8>] ? default_idle+0x18/0xd0 [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260 [<ffffffff81d4bf84>] ? start_kernel+0x46d/0x48d [<ffffffff81d510e6>] ? xen_start_kernel+0x52e/0x538 Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ccLooking at the dis-assembly of xen_clocksource_get_cycles() in arch/x86/xen/time.c I see no path how that should call __blk_mq_run_hw_queue(): _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |