Xen project Mailing List

Re: [Xen-devel] rcu_sched self-detect stall when disable vif device

To: David Vrabel <david.vrabel@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>

From: Julien Grall <julien.grall@xxxxxxxxxx>

Date: Wed, 28 Jan 2015 17:27:49 +0000

Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 28 Jan 2015 17:28:32 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 28/01/15 17:06, David Vrabel wrote: > On 28/01/15 16:45, Julien Grall wrote: >> On 27/01/15 16:53, Wei Liu wrote: >>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote: >>>> On 27/01/15 16:45, Wei Liu wrote: >>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote: >>>>>> Hi, >>>>>> >>>>>> While I'm working on support for 64K page in netfront, I got >>>>>> an rcu_sced self-detect message. It happens when netback is >>>>>> disabling the vif device due to an error. >>>>>> >>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why >>>>>> the processor is stucked in xenvif_rx_queue_purge? >>>>>> >>>>> >>>>> When you try to release a SKB, core network driver need to enter some >>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu. >>>> >>>> But this message shouldn't happen in normal condition or because of >>>> netfront. Right? >>>> >>> >>> Never saw report like this before, even in the case that netfront is >>> buggy. >> >> This is only happening when preemption is not enabled (i.e >> CONFIG_PREEMPT_NONE in the config file) in the backend kernel. >> >> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned >> into an infinite loop. In my case, the code executed looks like: >> >> >> 1. for (;;) { >> 2. xenvif_wait_for_rx_work(queue); >> 3. >> 4. if (kthread_should_stop()) >> 5. break; >> 6. >> 7. if (unlikely(vif->disabled && queue->id == 0) { >> 8. xenvif_carrier_off(vif); >> 9. xenvif_rx_queue_purge(queue); >> 10. continue; >> 11. } >> 12. } >> >> The wait on line 2 will return directly because the vif is disabled >> (see xenvif_have_rx_work) >> >> We are on queue 0, so the condition on line 7 is true. Therefore we will >> loop on line 10. And so on... >> >> On platform where preemption is not enabled, this thread will never >> yield/give the hand to another thread (unless the domain is destroyed). > > I'm not sure why we have a continue in the vif->disabled case and not > just a break. Can you try that? So I applied this small patches: diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 908e65e..9448c6c 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -2110,7 +2110,7 @@ int xenvif_kthread_guest_rx(void *data) if (unlikely(vif->disabled && queue->id == 0)) { xenvif_carrier_off(vif); xenvif_rx_queue_purge(queue); - continue; + break; } if (!skb_queue_empty(&queue->rx_queue)) While I don't get anymore message rcu_sched stall, when I destroy the guest, the backend hits a NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffff800000a50000 [00000000] *pgd=00000083de82a003, *pud=00000083de82b003, *pmd=00000083de82c003, *pte=00600000e1110707 Internal error: Oops: 96000006 [#1] SMP Modules linked in: CPU: 4 PID: 34 Comm: xenwatch Not tainted 3.19.0-rc5-xen-seattle+ #13 Hardware name: AMD Seattle (RevA) Development Board (Overdrive) (DT) task: ffff80001ea39480 ti: ffff80001ea78000 task.ti: ffff80001ea78000 PC is at exit_creds+0x18/0x70 LR is at __put_task_struct+0x3c/0xd4 pc : [<ffff8000000b2d94>] lr : [<ffff800000094990>] pstate: 80000145 sp : ffff80001ea7bc50 x29: ffff80001ea7bc50 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: ffff80001eb3c840 x23: ffff80001eb3c840 x22: 000000000006c560 x21: ffff0000011f7000 x20: 0000000000000000 x19: ffff80001ba06680 x18: 0000ffffd2635bd0 x17: 0000ffff839e4074 x16: 00000000deadbeef x15: ffffffffffffffff x14: 0ffffffffffffffe x13: 0000000000000028 x12: 0000000000000010 x11: 0000000000000030 x10: 0101010101010101 x9 : ffff80001ea7b8e0 x8 : ffff7c01cf6e2740 x7 : 0000000000000000 x6 : 0000000000002fc9 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : ffff80001ba06690 x1 : 0000000000000000 x0 : 0000000000000000 Process xenwatch (pid: 34, stack limit = 0xffff80001ea78058) Stack: (0xffff80001ea7bc50 to 0xffff80001ea7c000) bc40: 1ea7bc70 ffff8000 00094990 ffff8000 bc60: 1ba06680 ffff8000 008b45a8 ffff8000 1ea7bc90 ffff8000 000b15f0 ffff8000 bc80: 1ba06680 ffff8000 005bcab8 ffff8000 1ea7bcc0 ffff8000 00541efc ffff8000 bca0: 011ed000 ffff0000 00000000 00000000 011f7000 ffff0000 00000006 00000000 bcc0: 1ea7bd00 ffff8000 00540984 ffff8000 1ce23680 ffff8000 00000006 00000000 bce0: 00752cf0 ffff8000 00000001 00000000 00752e38 ffff8000 1ea7bd98 ffff8000 bd00: 1ea7bd40 ffff8000 00540bcc ffff8000 1ce23680 ffff8000 1cce0c00 ffff8000 bd20: 00000000 00000000 1cce0c00 ffff8000 009b0288 ffff8000 1ea7be20 ffff8000 bd40: 1ea7bd70 ffff8000 0048011c ffff8000 1ce23700 ffff8000 1cf71000 ffff8000 bd60: 009a6258 ffff8000 00a36d38 00000000 1ea7bdb0 ffff8000 00480ea4 ffff8000 bd80: 1b89d800 ffff8000 009a62b0 ffff8000 009a6258 ffff8000 00a36d38 ffff8000 bda0: 00a36e30 ffff8000 0047f7c0 ffff8000 1ea7bdc0 ffff8000 0047f82c ffff8000 bdc0: 1ea7be30 ffff8000 000b1064 ffff8000 1ea48cc0 ffff8000 009dbfe8 ffff8000 bde0: 008552d8 ffff8000 00000000 00000000 0047f778 ffff8000 00000000 00000000 be00: 1ea7be30 ffff8000 00000000 ffff8000 1ea39480 ffff8000 000c75f8 ffff8000 be20: 1ea7be20 ffff8000 1ea7be20 ffff8000 00000000 00000000 00085930 ffff8000 be40: 000b0f88 ffff8000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000 be60: 00000000 00000000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000 be80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bea0: 1ea7bea0 ffff8000 1ea7bea0 ffff8000 00000000 ffff8000 00000000 00000000 bec0: 1ea7bec0 ffff8000 1ea7bec0 ffff8000 00000000 00000000 00000000 00000000 bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000 bfe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call trace: [<ffff8000000b2d94>] exit_creds+0x18/0x70 [<ffff80000009498c>] __put_task_struct+0x38/0xd4 [<ffff8000000b15ec>] kthread_stop+0xc0/0x130 [<ffff800000541ef8>] xenvif_disconnect+0x58/0xd0 [<ffff800000540980>] set_backend_state+0x134/0x278 [<ffff800000540bc8>] frontend_changed+0x8c/0xec [<ffff800000480118>] xenbus_otherend_changed+0x9c/0xa4 [<ffff800000480ea0>] frontend_changed+0xc/0x18 [<ffff80000047f828>] xenwatch_thread+0xb0/0x140 [<ffff8000000b1060>] kthread+0xd8/0xf0 Code: f9000bf3 aa0003f3 f9422401 f9422000 (b9400021) ---[ end trace af11d521ee530da8 ]--- Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.