[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback BUG_ON when using copy_skb=1



Hi Jan,

In my test, the grant table copy error may cause that VM crash.
The stack is as follows:
kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372!
Pid: 2658, comm: iperf Not tainted 2.6.32-220.el6.x86_64 #1 Xen HVM domU
RIP: 0010:[<ffffffffa01166ca>]  [<ffffffffa01166ca>] 
xennet_tx_buf_gc+0x18a/0x1f0 [xen_netfront]
RSP: 0018:ffff880004403df8  EFLAGS: 00010096
RAX: 0000000000000049 RBX: ffff8800821986e0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff880004403e48 R08: ffffffff81c00690 R09: 0000000000000080
R10: 0000000000013030 R11: 0000000000000000 R12: 000000000000003b
R13: 000000000000023d R14: 0000000000000011 R15: 0000000000000011
FS:  00007fd8fd97e700(0000) GS:ffff880004400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000030270aab70 CR3: 0000000080cf4000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process iperf (pid: 2658, threadinfo ffff8800813ba000, task ffff880080d0eb00)
Stack:
 ffff880082198020 ffff880082198f90 ffff88007f8d00c0 0000003f04415fc0
<0> ffff880004403e28 ffff880082198768 ffff880082198020 ffff8800821986e0
<0> 0000000000000282 0000000000000100 ffff880004403e78 ffffffffa0117d4c
Call Trace:
 <IRQ>
 [<ffffffffa0117d4c>] xennet_interrupt+0x4c/0xb0 [xen_netfront]
 [<ffffffff810d94f0>] handle_IRQ_event+0x60/0x170
 [<ffffffff8109b8a3>] ? ktime_get+0x63/0xe0
 [<ffffffff810dbc2e>] handle_edge_irq+0xde/0x180
 [<ffffffff812fe809>] __xen_evtchn_do_upcall+0x1b9/0x1f0
 [<ffffffff812fedbf>] xen_evtchn_do_upcall+0x2f/0x50
 [<ffffffff8100c373>] xen_hvm_callback_vector+0x13/0x20

The BUG code in xen-netfront.c xennet_tx_buf_gc() is:
                        if (unlikely(gnttab_query_foreign_access(
                                np->grant_tx_ref[id]) != 0)) {
                                printk(KERN_ALERT "xennet_tx_buf_gc: warning "
                                       "-- grant still in use by backend "
                                       "domain.\n");
                                BUG();

In my guess the reason may be as follows:
1) XEN: The function _set_status() called in hypercall __gnttab_copy() and 
__acquire_grant_for_copy() is executed failed and the grant ref is not ended.
        So GTF_reading bit cannot be cleared.
2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is 
still set.

Regards,
Jerry

On 2013/10/17 16:00, Jan Beulich wrote:
>>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>> But there may be still concurrency problems in my test.
>> If the page replacing in copy_pending_req() was done after 
>> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked 
>> with GNTCOPY_source_gref.
>> Here the memory of that page in skb has been replaced with Dom0 local 
>> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in 
>> netbk_rx_actions() will get errors.
>> The messages is shown as:
>>
>> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)
>>
>> Would you like to share some opinions?
> 
> At a first glance that seems possible, but the question is - does it
> cause any problems other than the quoted message to be issued
> (and the problematic packet getting re-transmitted)? I'm asking
> mainly because fixing this would appear to imply adding locking to
> these paths - with the risk of adversely affecting performance.
> 
> Jan
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.