|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen-netback: fix race between napi_complete() and interrupt handler
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Tue, 25 Mar 2014 14:50:21 +0000
> You forgot to target this patch to "net" tree in subject line.
>
> On Tue, Mar 25, 2014 at 02:08:25PM +0000, David Vrabel wrote:
>> When the NAPI budget was not all used, xenvif_poll() would call
>> napi_complete() /after/ enabling the interrupt. This resulted in a
>> race between the napi_complete() and the napi_schedule() in the
>> interrupt handler. The use of local_irq_save/restore() avoided by
>> race iff the handler is running on the same CPU but not if it was
>> running on a different CPU.
>>
>
> OK, I understand this issue now. You mentioned it in the other email
> which made me a bit confused.
>
> Just curious, how do you trigger this? By re-binding the interrupt to
> another CPU when xenvif_poll is running? I used to run irqbalance (the
> one that works with xen virtual interrupt) but could not trigger a race.
> Probably the race window is too small to trigger?
>
>> Fix this properly by calling napi_complete() before reenabling
>> interrupts (in the xenvif_check_rx_xenvif() call).
>>
>> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
>> ---
>> drivers/net/xen-netback/interface.c | 28 ++--------------------------
>> 1 files changed, 2 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/interface.c
>> b/drivers/net/xen-netback/interface.c
>> index 7669d49..ee322d9 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int
>> budget)
>> work_done = xenvif_tx_action(vif, budget);
>>
>> if (work_done < budget) {
>> - int more_to_do = 0;
>> - unsigned long flags;
>> -
>> - /* It is necessary to disable IRQ before calling
>> - * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
>> - * lose event from the frontend.
>> - *
>> - * Consider:
>> - * RING_HAS_UNCONSUMED_REQUESTS
>> - * <frontend generates event to trigger napi_schedule>
>> - * __napi_complete
>> - *
>> - * This handler is still in scheduled state so the
>> - * event has no effect at all. After __napi_complete
>> - * this handler is descheduled and cannot get
>> - * scheduled again. We lose event in this case and the ring
>> - * will be completely stalled.
>> - */
>> -
>> - local_irq_save(flags);
>> -
>> - RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
>> - if (!more_to_do)
>> - __napi_complete(napi);
>> -
>> - local_irq_restore(flags);
>> + napi_complete(napi);
>
> You need to add comment here to say interrupt is in fact "disabled"
> before this point, and "enabled" by xenvif_check_rx_xenvif().
Agreed.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |