[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback BUG_ON when using copy_skb=1

>>> On 17.10.13 at 12:26, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
> Hi Jan,

please don't top post.

> In my test, the grant table copy error may cause that VM crash.
> The stack is as follows:
> kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372!
> ...
> The BUG code in xen-netfront.c xennet_tx_buf_gc() is:
>                       if (unlikely(gnttab_query_foreign_access(
>                               np->grant_tx_ref[id]) != 0)) {
>                               printk(KERN_ALERT "xennet_tx_buf_gc: warning "
>                                      "-- grant still in use by backend "
>                                      "domain.\n");
>                               BUG();
> In my guess the reason may be as follows:
> 1) XEN: The function _set_status() called in hypercall __gnttab_copy() and 
> __acquire_grant_for_copy() is executed failed and the grant ref is not ended.
>         So GTF_reading bit cannot be cleared.
> 2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is 
> still set.

If that was the case, this would be a hypervisor bug: a grant copy
operation is supposed to hold the grant active only for as long as
the copy operation takes. You'll in particular notice that
__acquire_grant_for_copy() in its error path clears GTF_reading
(and GTF_writing, as appropriate) again. You'd likely need to
instrument the code to demonstrate (via a couple of extra log
messages) what you think is not working properly here.


> On 2013/10/17 16:00, Jan Beulich wrote:
>>>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>>> But there may be still concurrency problems in my test.
>>> If the page replacing in copy_pending_req() was done after 
>>> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked 
>>> with GNTCOPY_source_gref.
>>> Here the memory of that page in skb has been replaced with Dom0 local 
>>> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in 
>>> netbk_rx_actions() will get errors.
>>> The messages is shown as:
>>> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)
>>> Would you like to share some opinions?
>> At a first glance that seems possible, but the question is - does it
>> cause any problems other than the quoted message to be issued
>> (and the problematic packet getting re-transmitted)? I'm asking
>> mainly because fixing this would appear to imply adding locking to
>> these paths - with the risk of adversely affecting performance.
>> Jan

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.