[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] rcu_sched self-detect stall when disable vif device



On 28/01/15 17:06, David Vrabel wrote:
> On 28/01/15 16:45, Julien Grall wrote:
>> On 27/01/15 16:53, Wei Liu wrote:
>>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote:
>>>> On 27/01/15 16:45, Wei Liu wrote:
>>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> While I'm working on support for 64K page in netfront, I got
>>>>>> an rcu_sced self-detect message. It happens when netback is
>>>>>> disabling the vif device due to an error.
>>>>>>
>>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why
>>>>>> the processor is stucked in xenvif_rx_queue_purge?
>>>>>>
>>>>>
>>>>> When you try to release a SKB, core network driver need to enter some
>>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu.
>>>>
>>>> But this message shouldn't happen in normal condition or because of
>>>> netfront. Right?
>>>>
>>>
>>> Never saw  report like this before, even in the case that netfront is
>>> buggy.
>>
>> This is only happening when preemption is not enabled (i.e
>> CONFIG_PREEMPT_NONE in the config file) in the backend kernel.
>>
>> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned
>> into an infinite loop. In my case, the code executed looks like:
>>
>>
>>  1. for (;;) {
>>  2.  xenvif_wait_for_rx_work(queue);
>>  3.
>>  4.  if (kthread_should_stop())
>>  5.         break;
>>  6.
>>  7.  if (unlikely(vif->disabled && queue->id == 0) {
>>  8.          xenvif_carrier_off(vif);
>>  9.          xenvif_rx_queue_purge(queue);
>> 10.          continue;
>> 11.  }
>> 12. }
>>
>> The wait on line 2 will return directly because the vif is disabled
>> (see xenvif_have_rx_work)
>>
>> We are on queue 0, so the condition on line 7 is true. Therefore we will
>> loop on line 10. And so on...
>>
>> On platform where preemption is not enabled, this thread will never
>> yield/give the hand to another thread (unless the domain is destroyed).
> 
> I'm not sure why we have a continue in the vif->disabled case and not
> just a break.  Can you try that?

So I applied this small patches:

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 908e65e..9448c6c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -2110,7 +2110,7 @@ int xenvif_kthread_guest_rx(void *data)
                if (unlikely(vif->disabled && queue->id == 0)) {
                        xenvif_carrier_off(vif);
                        xenvif_rx_queue_purge(queue);
-                       continue;
+                       break;
                }
 
                if (!skb_queue_empty(&queue->rx_queue))


While I don't get anymore message rcu_sched stall, when I destroy the
guest, the backend hits a NULL pointer dereference:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffff800000a50000
[00000000] *pgd=00000083de82a003, *pud=00000083de82b003, *pmd=00000083de82c003, 
*pte=00600000e1110707
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 4 PID: 34 Comm: xenwatch Not tainted 3.19.0-rc5-xen-seattle+ #13
Hardware name: AMD Seattle (RevA) Development Board (Overdrive) (DT)
task: ffff80001ea39480 ti: ffff80001ea78000 task.ti: ffff80001ea78000
PC is at exit_creds+0x18/0x70
LR is at __put_task_struct+0x3c/0xd4
pc : [<ffff8000000b2d94>] lr : [<ffff800000094990>] pstate: 80000145
sp : ffff80001ea7bc50
x29: ffff80001ea7bc50 x28: 0000000000000000 
x27: 0000000000000000 x26: 0000000000000000 
x25: 0000000000000000 x24: ffff80001eb3c840 
x23: ffff80001eb3c840 x22: 000000000006c560 
x21: ffff0000011f7000 x20: 0000000000000000 
x19: ffff80001ba06680 x18: 0000ffffd2635bd0 
x17: 0000ffff839e4074 x16: 00000000deadbeef 
x15: ffffffffffffffff x14: 0ffffffffffffffe 
x13: 0000000000000028 x12: 0000000000000010 
x11: 0000000000000030 x10: 0101010101010101 
x9 : ffff80001ea7b8e0 x8 : ffff7c01cf6e2740 
x7 : 0000000000000000 x6 : 0000000000002fc9 
x5 : 0000000000000000 x4 : 0000000000000001 
x3 : 0000000000000000 x2 : ffff80001ba06690 
x1 : 0000000000000000 x0 : 0000000000000000 

Process xenwatch (pid: 34, stack limit = 0xffff80001ea78058)
Stack: (0xffff80001ea7bc50 to 0xffff80001ea7c000)
bc40:                                     1ea7bc70 ffff8000 00094990 ffff8000
bc60: 1ba06680 ffff8000 008b45a8 ffff8000 1ea7bc90 ffff8000 000b15f0 ffff8000
bc80: 1ba06680 ffff8000 005bcab8 ffff8000 1ea7bcc0 ffff8000 00541efc ffff8000
bca0: 011ed000 ffff0000 00000000 00000000 011f7000 ffff0000 00000006 00000000
bcc0: 1ea7bd00 ffff8000 00540984 ffff8000 1ce23680 ffff8000 00000006 00000000
bce0: 00752cf0 ffff8000 00000001 00000000 00752e38 ffff8000 1ea7bd98 ffff8000
bd00: 1ea7bd40 ffff8000 00540bcc ffff8000 1ce23680 ffff8000 1cce0c00 ffff8000
bd20: 00000000 00000000 1cce0c00 ffff8000 009b0288 ffff8000 1ea7be20 ffff8000
bd40: 1ea7bd70 ffff8000 0048011c ffff8000 1ce23700 ffff8000 1cf71000 ffff8000
bd60: 009a6258 ffff8000 00a36d38 00000000 1ea7bdb0 ffff8000 00480ea4 ffff8000
bd80: 1b89d800 ffff8000 009a62b0 ffff8000 009a6258 ffff8000 00a36d38 ffff8000
bda0: 00a36e30 ffff8000 0047f7c0 ffff8000 1ea7bdc0 ffff8000 0047f82c ffff8000
bdc0: 1ea7be30 ffff8000 000b1064 ffff8000 1ea48cc0 ffff8000 009dbfe8 ffff8000
bde0: 008552d8 ffff8000 00000000 00000000 0047f778 ffff8000 00000000 00000000
be00: 1ea7be30 ffff8000 00000000 ffff8000 1ea39480 ffff8000 000c75f8 ffff8000
be20: 1ea7be20 ffff8000 1ea7be20 ffff8000 00000000 00000000 00085930 ffff8000
be40: 000b0f88 ffff8000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000
be60: 00000000 00000000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000
be80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bea0: 1ea7bea0 ffff8000 1ea7bea0 ffff8000 00000000 ffff8000 00000000 00000000
bec0: 1ea7bec0 ffff8000 1ea7bec0 ffff8000 00000000 00000000 00000000 00000000
bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000
bfe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call trace:
[<ffff8000000b2d94>] exit_creds+0x18/0x70
[<ffff80000009498c>] __put_task_struct+0x38/0xd4
[<ffff8000000b15ec>] kthread_stop+0xc0/0x130
[<ffff800000541ef8>] xenvif_disconnect+0x58/0xd0
[<ffff800000540980>] set_backend_state+0x134/0x278
[<ffff800000540bc8>] frontend_changed+0x8c/0xec
[<ffff800000480118>] xenbus_otherend_changed+0x9c/0xa4
[<ffff800000480ea0>] frontend_changed+0xc/0x18
[<ffff80000047f828>] xenwatch_thread+0xb0/0x140
[<ffff8000000b1060>] kthread+0xd8/0xf0
Code: f9000bf3 aa0003f3 f9422401 f9422000 (b9400021) 
---[ end trace af11d521ee530da8 ]---

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.