[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback BUG_ON when using copy_skb=1

Hi Jan,

Thanks for your reply.
Yes, I am using the SLE11 kernel 3.0.58 which is not up-to-date as you assumed.
I find one related patch named xen-netback-generalize which was committed on 
Aug 7 and has been applied to SLE11 kernel 3.0.98.
That BUG_ON(netbk->mmap_pages[idx] != page) has been removed in this patch.

But there may be still concurrency problems in my test.
If the page replacing in copy_pending_req() was done after netif_get_page_ext() 
in netbk_gop_frag(), copy_gop->flags is wrongly marked with GNTCOPY_source_gref.
Here the memory of that page in skb has been replaced with Dom0 local memory, 
so the later HYPERVISOR_multicall() with GNTTABOP_copy in netbk_rx_actions() 
will get errors.
The messages is shown as:

(XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)

Would you like to share some opinions?

On 2013/10/16 19:10, Jan Beulich wrote:
>>>> On 16.10.13 at 06:13, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>> Hi Wei Liu,
>> I am doing some network performance on Xen4.1.2 and kernel 3.0, and get a 
>> crash with BUG_ON(netbk->mmap_pages[idx] != page) in netbk_gop_frag() 
>> accidentally.
>> By analyzing the module drivers/xen/netback,
> You aren't looking at the upstream driver, are you? If so, Wei is
> very likely the wrong addressee.
> Assuming that you instead talk of the SLE11 kernel, I can only
> point out that a problem in that code was found and fixed a
> couple of months ago (resulting in the BUG_ON() you quoted not
> being there anymore), so you're simply not looking at up-to-date
> code.
> Jan
>> I think the reason is as 
>> follows when sending packets from VM1 to VM2:
>> 1) The two netback thread(the first for VM1 sending, second for VM2 
>> receiving) run concurrently.
>> 2) In first netback thread, it will do delayed copy from a foreign granted 
>> page to local memory when some outstanding packets have been pending too 
>> long( above half of one HZ).
>>    Then netbk->mmap_pages[idx] will be replaced with new allocated page.
>> 3) If the packets are forwarded to VM2 by virtual switch, netbk_gop_frag() 
>> will be called in second netback thread.
>>    And that function will judge whether the pages in skb frags[] is foreign 
>> in order to make sure how to do grant copy.
>> 4) If the page replacing was done after the page foreign judge in 
>> netbk_gop_frag(), the BUG will be invoked because the page from skb frags[] 
>> are different with mmap_pages[idx].
>> I tried to using spin_lock to protect the page accessing, but no appropriate 
>> solutions was found.
>> How to fix this problem?  Would you like to share some opinions?
>> In addition, I have tried to turn off copy_skb. Then the vif netdevice may 
>> not be released after shutting down VM,
>> that's because outstanding packets hold the reference count of the device 
>> too long for some unknown reason.
>> The reason may be that the NIC does not release packets after DMA.
>> Does anyone have met such problems? Thanks.
>> Best regards,
>> Jerry
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx 
>> http://lists.xen.org/xen-devel 
> .

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.