[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VM spontaneously losing network on 10gig interface



On 18/09/2012 00:00, Nathan March wrote:
> Hi All,
>
> Having a very strange problem where a VM's bridge will spontaneously 
> stop bridging traffic. This only seems to occur on our 10gig interfaces 
> (intel x540 on ixgbe driver, mtu 9000), which are 2x links bonded into 
> bond0, then broken down into pvlan462/pvlan463/etc before being bridged 
> with the DomU's. Everything works great at first but several hours after 
> starting a large rsync traffic stops crossing the bridge. Once it's 
> stopped working it only affects that single VM on that single interface. 
> Other VM's on the same dom0 still have access to the same affected vlan.
>
> Layout is Nexenta NFS ---> 2x arista 10gig switches --> intel x540-t2 
> (ixgbe) on dom0 --802.3ad--> bond0 --vconfig--> vlan 462 --bridged--> 
> pvlan 462 / vif4.1 / vif6.1.
> Dom0 is running kernel 3.2.28 w/ xen 4.1.3, domU is kernel 2.6.32.27
>
> xen3 ~ # brctl show
> bridge name     bridge id               STP enabled     interfaces
> vlan462         8000.a0369f0eac2c       no              pvlan462
>                                                          vif4.1
>                                                          vif6.1
> vlan463         8000.a0369f0eac2c       no              pvlan463
>                                                          vif5.1
>
> Once it breaks, doing a tcpdump inside the vm or on the dom0 against the 
> vif show the same arp traffic from the VM (looking for the nfs server), 
> but nothing incoming to the VM at all. Tcpdumping on the parent bridge 
> shows the traffic as normal and other VMs on this bridge have regular 
> access still, only the single vif is affected.
>
> I've tried toggling net.bridge.bridge-nf-call-(arp|ip|ip6)tables off and 
> it didn't seem to make a difference (also flushed all ip/eb/arptables 
> rules just in case).
>
> It takes me several hours to reproduce just by copying data and I 
> haven't managed to figure out a nice small test case yet or what 
> triggers the break. Considering I've found one bug in ixgbe already 
> (reported + fixed!) I suspect the 10gig driver, but seems like this 
> problem would come from either xen or bridging. This feels like a xen 
> net back/front issue?
>
> Any ideas? Or suggestions on where to start looking?

What happens if you detach the vif from the bridge and reattach it -
does the problem go away?

~Andrew

>
> Thanks!
>
> - Nathan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.