[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"



On Wed, Mar 19, 2014 at 12:11:04AM +0100, Sander Eikelenboom wrote:
[...]
> 
> >>> Before Paul's change, we always reserve very large room for an incoming
> >>> SKB. After Paul's change, we only reserve just enough room. Probably
> >>> some extra room prevents this bug being triggered.
> 
> >> [  599.970745] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req 
> >> npo->meta_prod:37 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506387 
> >> vif->rx.sring->req_event:504174
> >> [  599.972487] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> >> min_slots_needed:1 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506387 
> >> vif->rx.sring->req_event:506388
> >> [  600.044322] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req 
> >> npo->meta_prod:37 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 
> >> req->gref:165543936 req->id:19 vif->rx.sring->req_event:506388
> >> [  600.081167] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 2  
> >> npo->meta_prod:38 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 
> >> npo->copy_gref:165543936  npo->copy_off:0  MAX_BUFFER_OFFSET:4096 
> >> bytes:1168 size:1168 i:6 vif->rx.sring->req_event:506388 
> >> estimated_slots_needed:8
> >> [  600.118268] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here end 
> >> npo->meta_prod:38 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 
> >> npo->copy_gref:165543936 npo->copy_off:1168  MAX_BUFFER_OFFSET:4096 
> >> bytes:1168 size:0 i:7 vif->rx.sring->req_event:506388 
> >> estimated_slots_needed:8
> >> [  600.155378] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 4 
> >> npo->meta_prod:38 old_meta_prod:30 vif->rx.sring->req_prod:506387 
> >> vif->rx.req_cons:506388 gso_type:1 gso_size:1448 nr_frags:1 req->gref:570 
> >> req->id:11 estimated_slots_needed:8 i(frag): 0
> >> [  600.192438] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 
> >> npo->meta_prod:38 old_meta_prod:30 vif->rx.sring->req_prod:506387 
> >> vif->rx.req_cons:506388 gso_type:1 gso_size:1448 nr_frags:1 req->gref:570 
> >> req->id:11 estimated_slots_needed:8
> >> [  600.229395] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 ..  
> >> vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 
> >> sco->meta_slots_used:8 max_upped_gso:1 skb_is_gso(skb):1 
> >> max_slots_needed:8 j:3 is_gso:1 nr_frags:1 firstpart:1 secondpart:6 
> >> min_slots_needed:3
> >> [  600.266518] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 1 ..  
> >> vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388                     
> >>     max_upped_gso:1 skb_is_gso(skb):0 max_slots_needed:1 j:4 is_gso:0 
> >> nr_frags:0 firstpart:1 secondpart:0 min_slots_needed:1
> 
> >> It seems to estimate 8 slots and need 8 slots ... however .. shouldn't the 
> >> queue have been stopped before getting here ..
> 
> > *hrmm please just ignore* .. got to get some sleep i guess ..
> 
> Just went the empirical way .. and unconditionally upped the calculated 
> "max_slots_needed" by one in "xenvif_rx_action"  just before checking the 
> "xenvif_rx_ring_slots_available",
> this has prevented all non-fatal and fatal occurrences of "cons > prod" so 
> far. I will leave my tests running overnight, see if it survives the pounding.
> 
> >From other net drivers i see different calculations .. seems it is kind of 
> >voodoo to determine the value .. one of which 
> >(drivers/net/ethernet/marvell/sky2.c)
> seems to unconditionally reserve a slot for both GSO and CKSUM. Don't know if 
> that makes any sense though:
> 
> /* This is the worst case number of transmit list elements for a single skb:
>    VLAN:GSO + CKSUM + Data + skb_frags * DMA */
> #define MAX_SKB_TX_LE   (2 + 
> (sizeof(dma_addr_t)/sizeof(u32))*(MAX_SKB_FRAGS+1))
> 

Xen network "wire" format doesn't have a CKSUM metadata type though.

> 
> 
> >>> Wei.
> 
> >>>> 
> >>>> >> The second time it does get to the code after the RING_GET_REQUEST in 
> >>>> >> 'get_next_rx_buffer' and that leads to mayhem ...
> >>>> >> 
> >>>> >> So added a netdev_warn to xenvif_start_xmit for the "stop queue" case 
> >>>> >> .. unfortunately it now triggers net_ratelimit at the end:
> >>>> >> 
> >>>> >> [  402.909693] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping 
> >>>> >> queue !  min_slots_needed:7 vif->rx.sring->req_prod:189228 
> >>>> >> vif->rx.req_cons:189222
> >>>> 
> >>>> > I think xenvif_rx_ring_slots_available is doing its job. If req_prod -
> >>>> > req_cons < needed, the queue is stopeed.
> 
> >> So it seems .. most of the time .. but if i look at the calculation of 
> >> "min_slots_needed" in this function it seems completely different from the 
> >> one in
> >> xenvif_rx_action for max_slots_needed .. though both seem to be used for 
> >> the same thing .. to calcultate how many slots the brokendown SKB would 
> >> need to fit in ..
> >> So i added the calculation method from xenvif_start_xmit to 
> >> xenvif_rx_action .. in the error case you see min_slots_needed was 3 .. 
> >> but max_slots_needed and max_slots_used both were 8.
> 

Those are different estimations. max_slots_needed estimates the worst
case while min_slots_needed estimates the best case. When
min_slots_needed is met, netback puts the SKB into its internal queue.
xenvif_rx_action then will dequeue that packet, check max_slots_needed,
if met, break SKB down.

What I would expect is, even if they yield different values, checking
the ring availability is enough before breaking SKB down.

In your above case, max_slots_needed was met and SKB broken down. And as
you said in your empirical test, bumping max_slots_needed by 1 prevented
issues, then we might have problem calculating max_slots_needed. If you
can work out what went wrong that can be very useful.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.