[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] The strange case of xen_netback not returning ARP replies
On Wed, May 16, 2012 at 02:18:27PM +0200, Joanna Rutkowska wrote: > Hello, > > I'm facing a rather strange problem with the netback interface. My setup > involves a netvm, which has some physical network interfaces assigned, > and a client VM where a net front is running (exposed as eth0) and which > is connected to that netvm (via vif42.0 interface, as seen in the netvm > on the dumps below). > > Now, the netvm has two physical network interfaces assigned: > 1) A standard Intel AGN (iwlwifi module, interface wlan0) -- this is > just a PCI devices assigned > > 2) A USB 3G modem (cdc_ncm module, usb0 interface) -- this has been made > available to the netvm by assigning a whole USB controller, where the 3G > modem is connected to. This works fine. There are some patches posted about netback and SKB slots that might apply to the problem you guys are seeing. > > We do NAT in netvm for the traffic coming on vif* and send it out > through the default outgoing interface, e.g. wlan0. Now, as long as I > use the wlan0 for networking all works great. I've been using this setup > for years, no problem here. > > However, when I switch to usb0 as a default outgoing interface in the > netvm, something strange happens. The networking works fine via usb0 for > some time (a few minutes typically), yet suddenly, after enough packets > got exchanged, the networking stops working. > > When I run tcpdump on the vif* interface I can see that suddenly there > is nobody (in the netvm) to reply for the ARP requests from the client > VM (the client vm has Xen ID = 42 in this dump, and IP = .5, and gateway > = .1): > > [root@netvm user]# tcpdump -ni vif42.0 arp > tcpdump: WARNING: vif42.0: no IPv4 address assigned > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on vif42.0, link-type EN10MB (Ethernet), capture size 65535 bytes > 13:41:55.031819 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:41:56.031860 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:41:57.031794 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:41:59.287308 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:42:00.283853 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:42:01.283816 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:42:03.231324 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length > > ... and this now continues until no end. > > For comparison, this is how it looks when I use networking via wlan0: > > [root@netvm user]# tcpdump -ni vif42.0 arp > tcpdump: WARNING: vif42.0: no IPv4 address assigned > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on vif42.0, link-type EN10MB (Ethernet), capture size 65535 bytes > 13:39:00.215883 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:39:00.215911 ARP, Reply 10.137.1.1 is-at fe:ff:ff:ff:ff:ff, length 28 > 13:39:21.799844 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28 > 13:39:21.799869 ARP, Reply 10.137.1.1 is-at fe:ff:ff:ff:ff:ff, length 28 > > We can see that every once in a while an ARP request for 10.137.1.1 > appears (a gateway for clientvm, so the netvm), yet this is immediately > being answered (by netvm, as I understand). > > Now, this behavior seems really strange, because: > > 1) AFAIU, the ARP replies are/should be generated by the netback > interface in the netvm (vif*). > > 2) It shouldn't matter, for the netback code, how the packets are later > routed (via wlan0 vs. usb0) to provide this (dummy) arp response? > > 3) ...yet, for some reason, in the case when packets are later routed > through usb0, the netback is not willing to generate arp response??? > > Or am I misunderstanding this, and it is somebody else who is generating > the arp responses? The final NIC? > > Some additional notes: > 1) We make sure to set /proc/sys/net/ipv4/conf/vif*/proxy_arp to 1 > > 2) When this "arp hang" happens, the networking (via usb0) is still > working fine in the netvm (i.e. I can do ping google.com from the netvm) > > 3) This has been tested on various VM kernels (in the netvm): 3.0.4, > 3.2.7, and 3.3.5 -- all exhibit the same behavior. > > 4) Nothing spectacular in the logs of the netvm, however, I can often > see this crash in the *client* VM: > > [ 1257.228761] ------------[ cut here ]------------ > [ 1257.228767] WARNING: at > /home/user/qubes-src/kernel/kernel-3.3.5/linux-3.3.5/fs/sysfs/file.c:498 > sysfs_attr_ns+0x93/0xa0() > [ 1257.228776] sysfs: kobject eth0 without dirent > [ 1257.228780] Modules linked in: iptable_raw bnep bluetooth rfkill > ipt_MASQUERADE ipt_REJECT xt_state xt_tcpudp xen_netback iptable_filter > iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 > ip_tables x_tables xen_netfront microcode pcspkr u2mfn(O) xen_blkback > xen_evtchn autofs4 ext4 jbd2 crc16 dm_snapshot xen_blkfront [last > unloaded: scsi_wait_scan] > [ 1257.228819] Pid: 11, comm: xenwatch Tainted: G W O > 3.3.5-1.pvops.qubes.x86_64 #1 > [ 1257.228825] Call Trace: > [ 1257.228830] [<ffffffff810495aa>] warn_slowpath_common+0x7a/0xb0 > [ 1257.228836] [<ffffffff81049681>] warn_slowpath_fmt+0x41/0x50 > [ 1257.228842] [<ffffffff81057ba7>] ? lock_timer_base+0x37/0x70 > [ 1257.228850] [<ffffffff811a7433>] sysfs_attr_ns+0x93/0xa0 > [ 1257.228856] [<ffffffff811a7aef>] sysfs_remove_file+0x1f/0x40 > [ 1257.228862] [<ffffffff812e5622>] device_remove_file+0x12/0x20 > [ 1257.228870] [<ffffffffa00faf5a>] xennet_remove+0x84/0xac [xen_netfront] > [ 1257.228875] [<ffffffff812b5c82>] xenbus_dev_remove+0x42/0xa0 > [ 1257.228881] [<ffffffff812e85a7>] __device_release_driver+0x77/0xd0 > [ 1257.228887] [<ffffffff812e86e8>] device_release_driver+0x28/0x40 > [ 1257.228895] [<ffffffff812e790f>] bus_remove_device+0x10f/0x180 > [ 1257.228901] [<ffffffff812e5808>] device_del+0x118/0x1c0 > [ 1257.228906] [<ffffffff812e58cd>] device_unregister+0x1d/0x60 > [ 1257.228914] [<ffffffff812b5a46>] xenbus_dev_changed+0x96/0x1b0 > [ 1257.228920] [<ffffffff812b74b4>] frontend_changed+0x24/0x50 > [ 1257.228926] [<ffffffff812b4221>] xenwatch_thread+0xb1/0x170 > [ 1257.228933] [<ffffffff8106aea0>] ? wake_up_bit+0x40/0x40 > [ 1257.228939] [<ffffffff812b4170>] ? xenbus_thread+0x40/0x40 > [ 1257.228944] [<ffffffff8106a9a6>] kthread+0x96/0xa0 > [ 1257.228951] [<ffffffff81465724>] kernel_thread_helper+0x4/0x10 > [ 1257.228959] [<ffffffff8145c7fc>] ? retint_restore_args+0x5/0x6 > [ 1257.228964] [<ffffffff81465720>] ? gs_change+0x13/0x13 > [ 1257.228968] ---[ end trace 75286ef58ce0391f ]--- > > But this seems rather irrelevant, as it seems like it is the netvm that > is failing here, i.e. it doesn't generate ARP responses? > > I would appreciate any help with this issue! > > Thanks, > joanna. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |