[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] rcu_sched self-detect stall when disable vif device
On 28/01/15 17:27, Julien Grall wrote: > On 28/01/15 17:06, David Vrabel wrote: >> On 28/01/15 16:45, Julien Grall wrote: >>> On 27/01/15 16:53, Wei Liu wrote: >>>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote: >>>>> On 27/01/15 16:45, Wei Liu wrote: >>>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote: >>>>>>> Hi, >>>>>>> >>>>>>> While I'm working on support for 64K page in netfront, I got >>>>>>> an rcu_sced self-detect message. It happens when netback is >>>>>>> disabling the vif device due to an error. >>>>>>> >>>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why >>>>>>> the processor is stucked in xenvif_rx_queue_purge? >>>>>>> >>>>>> >>>>>> When you try to release a SKB, core network driver need to enter some >>>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu. >>>>> >>>>> But this message shouldn't happen in normal condition or because of >>>>> netfront. Right? >>>>> >>>> >>>> Never saw report like this before, even in the case that netfront is >>>> buggy. >>> >>> This is only happening when preemption is not enabled (i.e >>> CONFIG_PREEMPT_NONE in the config file) in the backend kernel. >>> >>> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned >>> into an infinite loop. In my case, the code executed looks like: >>> >>> >>> 1. for (;;) { >>> 2. xenvif_wait_for_rx_work(queue); >>> 3. >>> 4. if (kthread_should_stop()) >>> 5. break; >>> 6. >>> 7. if (unlikely(vif->disabled && queue->id == 0) { >>> 8. xenvif_carrier_off(vif); >>> 9. xenvif_rx_queue_purge(queue); >>> 10. continue; >>> 11. } >>> 12. } >>> >>> The wait on line 2 will return directly because the vif is disabled >>> (see xenvif_have_rx_work) >>> >>> We are on queue 0, so the condition on line 7 is true. Therefore we will >>> loop on line 10. And so on... >>> >>> On platform where preemption is not enabled, this thread will never >>> yield/give the hand to another thread (unless the domain is destroyed). >> >> I'm not sure why we have a continue in the vif->disabled case and not >> just a break. Can you try that? > > So I applied this small patches: > > diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 908e65e..9448c6c 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -2110,7 +2110,7 @@ int xenvif_kthread_guest_rx(void *data) > if (unlikely(vif->disabled && queue->id == 0)) { > xenvif_carrier_off(vif); > xenvif_rx_queue_purge(queue); > - continue; > + break; > } > > if (!skb_queue_empty(&queue->rx_queue)) How about this? 8<------------------------------------------ xen-netback: stop the guest rx thread after a fatal error After commit e9d8b2c2968499c1f96563e6522c56958d5a1d0d (xen-netback: disable rogue vif in kthread context), a fatal (protocol) error would leave the guest Rx thread spinning, wasting CPU time. Commit ecf08d2dbb96d5a4b4bcc53a39e8d29cc8fef02e (xen-netback: reintroduce guest Rx stall detection) made this even worse by removing a cond_resched() from this path. A fatal error is non-recoverable so just allow the guest Rx thread to exit. This requires taking additional refs to the task so the thread exiting early is handled safely. Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 9259a73..037f74f 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -578,6 +578,7 @@ int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref, goto err_rx_unbind; } queue->task = task; + get_task_struct(task); task = kthread_create(xenvif_dealloc_kthread, (void *)queue, "%s-dealloc", queue->name); @@ -634,6 +635,7 @@ void xenvif_disconnect(struct xenvif *vif) if (queue->task) { kthread_stop(queue->task); + put_task_struct(queue->task); queue->task = NULL; } diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 908e65e..c8ce701 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -2109,8 +2109,7 @@ int xenvif_kthread_guest_rx(void *data) */ if (unlikely(vif->disabled && queue->id == 0)) { xenvif_carrier_off(vif); - xenvif_rx_queue_purge(queue); - continue; + break; } if (!skb_queue_empty(&queue->rx_queue)) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |