[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] VM spontaneously losing network on 10gig interface


  • To: xen-devel@xxxxxxxxxxxxx
  • From: Nathan March <nathan@xxxxxx>
  • Date: Mon, 17 Sep 2012 16:00:29 -0700
  • Delivery-date: Mon, 17 Sep 2012 23:01:08 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gt.net; h=message-id:date :from:mime-version:to:subject:content-type :content-transfer-encoding; q=dns; s=mail; b=wnVsut7Z8+yoCJ80fKx F8x4oH+J1qyvCUa14BlRqdbgMf18Mg+iMW9wwHA/hdNE1E50NyatPwl5QUkURgUg GUB6v0IbSAAxWqGjF319Ibc8BaXVZZAcuUZy2F9A29OIQPkrd3+AXbRLWu9Bo3BU Hp6cyBSFkVb5f4HBOPG0UOcg=
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi All,

Having a very strange problem where a VM's bridge will spontaneously stop bridging traffic. This only seems to occur on our 10gig interfaces (intel x540 on ixgbe driver, mtu 9000), which are 2x links bonded into bond0, then broken down into pvlan462/pvlan463/etc before being bridged with the DomU's. Everything works great at first but several hours after starting a large rsync traffic stops crossing the bridge. Once it's stopped working it only affects that single VM on that single interface. Other VM's on the same dom0 still have access to the same affected vlan.

Layout is Nexenta NFS ---> 2x arista 10gig switches --> intel x540-t2 (ixgbe) on dom0 --802.3ad--> bond0 --vconfig--> vlan 462 --bridged--> pvlan 462 / vif4.1 / vif6.1.
Dom0 is running kernel 3.2.28 w/ xen 4.1.3, domU is kernel 2.6.32.27

xen3 ~ # brctl show
bridge name     bridge id               STP enabled     interfaces
vlan462         8000.a0369f0eac2c       no              pvlan462
                                                        vif4.1
                                                        vif6.1
vlan463         8000.a0369f0eac2c       no              pvlan463
                                                        vif5.1

Once it breaks, doing a tcpdump inside the vm or on the dom0 against the vif show the same arp traffic from the VM (looking for the nfs server), but nothing incoming to the VM at all. Tcpdumping on the parent bridge shows the traffic as normal and other VMs on this bridge have regular access still, only the single vif is affected.

I've tried toggling net.bridge.bridge-nf-call-(arp|ip|ip6)tables off and it didn't seem to make a difference (also flushed all ip/eb/arptables rules just in case).

It takes me several hours to reproduce just by copying data and I haven't managed to figure out a nice small test case yet or what triggers the break. Considering I've found one bug in ixgbe already (reported + fixed!) I suspect the 10gig driver, but seems like this problem would come from either xen or bridging. This feels like a xen net back/front issue?

Any ideas? Or suggestions on where to start looking?

Thanks!

- Nathan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.