[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: Crash in netbk_gop_frag_copy?



Ian Campbell wrote:
> Hi Dongxiao,
> 
> We're seeing a crash in netbk_gop_frag_copy and I was wondering if it
> might related to the multithreaded/tasklet netback patches. This is
> in a 
> 2.6.27 traditional Xen kernel but the most recent netback change made
> in 
> that tree was the backport of the multithread patches.
> 
> The crash appears to correspond to the "copy_gop->source.domid =
> src_pend->netif->domid" line of netbk_gop_frag_copy, specifically
> src_pend->netif seems to be NULL.
> 
> Have you seen anything like this?

No I didn't met this kind of error before. 
Did you encounter this issue while doing inter-domain communication?
I am a bit suspicious how you get the group number in netbk_gop_frag_copy(), 
could you paste your rebased patch?

> 
> One thing I did notice is that 020ba906 "xen/netback: Multiple
> tasklets 
> support." includes this hunk:
> @@ -321,7 +331,8 @@ static u16 netbk_gop_frag(struct xen_netif
> *netif, struct netbk_rx_meta *meta, 
> 
>         copy_gop = npo->copy + npo->copy_prod++;
>         copy_gop->flags = GNTCOPY_dest_gref;
> -       if (idx > -1) {
> +       if (PageForeign(page)) {
> +               struct xen_netbk *netbk = &xen_netbk[group];
>                 struct pending_tx_info *src_pend =
>                 &netbk->pending_tx_info[idx]; copy_gop->source.domid
>                 = src_pend->netif->domid; copy_gop->source.u.ref =
> src_pend->req.gref; 
> 
> I'm not sure it is guaranteed that all foreign pages which reach this
> point are netback pages, is it? gnttab_copy_grant_page also sets
> PageForeign for example.

Here the page is guaranteed to be netback pages. 
See callers of netbk_gop_frag_copy(). 

> 
> Does this change relate to the removal of the
>        if ((idx >= MAX_PENDING_REQS) || (netbk->mmap_pages[idx] !=
>                pg)) return -1;
> check from netif_page_index in a3031942 "xen/netback: Introduce a new
> struct type page_ext."?

Actually this logic still exists in the code, see lines:

+       BUG_ON(idx < 0 || idx >= MAX_PENDING_REQS);
+       BUG_ON(netbk->mmap_pages[idx] != page);

Thanks,
Dongxiao

> 
> Do you think we perhaps need to reinstate a similar check to this as
> well as first checking that group is a sensible offset into the
> xen_netbk array?
> 
> Thanks,
> Ian.
> 
> [2010-07-11 03:50:28 UTC] BUG: unable to handle kernel paging request
> at 
> f0052dac
> [2010-07-11 03:50:28 UTC] IP: [<c0284ed8>]
> netbk_gop_frag_copy+0x78/0x200 [2010-07-11 03:50:28 UTC] Oops: 0000
> [#1] SMP [2010-07-11 03:50:28 UTC] last sysfs file:
> /sys/devices/xen-backend/vbd-1669-51712/statistics/rd_usecs
> [2010-07-11 03:50:28 UTC] Modules linked in: tun nfs nfs_acl
> dm_round_robin scsi_dh_emc dm_multipath scsi_dh bonding hfsplus lockd
> sunrpc bridge stp llc(N) ipt_REJECT nf_conntrack_ipv4 xt_state
> nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables binfmt_misc
> nls_utf8 isofs(N) sbs sbshc fan battery ac parport_pc lp parport
> nvram sg evdev(N) usb_storage libusual(N) container bnx2
> zlib_inflate(N) usbhid thermal ff_memless qla2xxx processor button
> thermal_sys hpilo scsi_transport_fc e1000e piix serio_raw 8250_pnp
> ide_cd_mod 8250 cdrom serial_core ata_piix libata dock rtc_cmos
> rtc_core rtc_lib pcspkr ide_generic dm_snapshot dm_zero dm_mirror
> dm_log dm_mod ide_disk cciss sd_mod scsi_mod ext3 jbd uhci_hcd
> ohci_hcd ehci_hcd usbcore fbcon(N) font(N) tileblit(N) bitblit(N)
> softcursor(N) [2010-07-11 03:50:28 UTC] Supported: No, Unsupported
> modules are loaded [2010-07-11 03:50:28 UTC] [2010-07-11 03:50:28
> UTC] Pid: 1173, comm: netback/2 Tainted: G         
> (2.6.27.45-0.1.1.xs5.6.900.128.111247xen #1) [2010-07-11 03:50:28
> UTC] EIP: 0061:[<c0284ed8>] EFLAGS: 00010246 CPU: 0 [2010-07-11
> 03:50:28 UTC] EIP is at netbk_gop_frag_copy+0x78/0x200 [2010-07-11
> 03:50:28 UTC] EAX: f0052d9c EBX: f0087384 ECX: 00000000 EDX: f0052cfc
> [2010-07-11 03:50:28 UTC] ESI: 00000006 EDI: eccfbf44 EBP: eccfbee0
> ESP: eccfbec0 [2010-07-11 03:50:28 UTC]  DS: 007b ES: 007b FS: 00d8
> GS: 0000 SS: 0069 [2010-07-11 03:50:28 UTC] Process netback/2 (pid:
> 1173, ti=eccfa000 task=ed81d030 task.ti=eccfa000) [2010-07-11
> 03:50:28 UTC] Stack: c15a8ba0 cdfb2480 ffff1cfc f008ad3c 00000000
> 00000006 0000000c 00000001 [2010-07-11 03:50:28 UTC]        eccfbfa4
> c02852b4 00000006 000000d2 00000000 c16bdb34 eccfbf88 eccfbf20
> [2010-07-11 03:50:28 UTC]        f008730c f008630c eca73bdc f008530c
> 00000001 f007d630 eea80200 ed81d030 [2010-07-11 03:50:28 UTC] Call
> Trace: [2010-07-11 03:50:28 UTC]  [<c02852b4>] ?
> net_rx_action+0x254/0x920 [2010-07-11 03:50:28 UTC]  [<c0287407>] ?
> netbk_action_thread+0x97/0x170 [2010-07-11 03:50:28 UTC] 
> [<c013de00>] ? autoremove_wake_function+0x0/0x50 [2010-07-11 03:50:28
> UTC]  [<c0287370>] ? netbk_action_thread+0x0/0x170 [2010-07-11
> 03:50:28 UTC]  [<c013daa2>] ? kthread+0x42/0x70 [2010-07-11 03:50:28
> UTC]  [<c013da60>] ? kthread+0x0/0x70 [2010-07-11 03:50:28 UTC] 
> [<c010569b>] ? kernel_thread_helper+0x7/0x10 [2010-07-11 03:50:28
> UTC]  ======================= [2010-07-11 03:50:28 UTC] Code: 69 db
> 04 e3 00 00 8d 04 40 8d 54 82 f4 89 55 ec 89 5d e8 eb 65 8b 45 f0 8b
> 55 e8 03 15 a8 94 53 c0 c1 e0 04 8d 84 10 a0 00 00 00 <8b> 50 10 0f
> b7 12 66 89 53 04 8b 40 04 66 c7 43 12 03 00 89 03 [2010-07-11
> 03:50:28 UTC] EIP: [<c0284ed8>] netbk_gop_frag_copy+0x78/0x200 SS:ESP
> 0069:eccfbec0 [2010-07-11 03:50:28 UTC] ---[ end trace
> cf7f02bf1fe43242 ]---                   


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.