[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] lost gARP after live migration

On Tue, 2011-06-28 at 14:01 +0100, Laszlo Ersek wrote:
> Hi,
> with reference to RHBZ#713585:
> It seems when a RHEL-6.1 or F-15 Xen PV guest is live migrated, the 
> gratuitous ARP packet is not forwarded to the affected "networking 
> equipment". The netback vif is added to a routed bridge in the host(s) 
> and external hosts are expeted to have connection to the guest at all 
> times, no matter the current Xen host.
> I experimented a bit with tcpdump, and the gARP does appear on the 
> netfront interface. It also appears on the host bridge if sufficient 
> time passes between completing the xenbus handshake and sending the gARP.
> When the guest queues eg. three gARPs in rapid succession, a variable 
> number of them gets lost. (When all such packets disappear, then the 
> migrated guest becomes invisible to the outside world, until it 
> initiates network traffic on its own.)
> When the guest waits for about half a second before sending (queueing), 
> the very first gARP packet successfully appears on the host bridge.
> I suspect it's a timing race against the netback vif being added to the 
> host bridge. What would be a good countermeasure?
> - Adding two modparams to xen-netfront (gARP requeue count & number of 
> msecs to wait between queueing the gARPs).
> - (Paolo's idea:) watching the "hotplug-status" xenstore node and 
> sending a single gARP when the watch fires with "connected". This node 
> belongs to the backend xenstore subtree, thus watching it from the guest 
> doesn't please the architecture astronaut in me.

netback already waits (or should...) for hotplug-status to fire with
"connected" before moving to state XenbusStateConnected. See
hotplug_status_changed in drivers/net/xen-netback/xenbus.c. You need
either the netback in upstream or something newer than 43223efd9bfd (C
Feb 2010) if you are using e.g. xen.git#xen/next-2.6.32. That commit
fixes pretty much the issue you describe.

I expected that netfront waited for the backend to hit
XenbusStateConnected before sending the grat ARP but instead I find it
happens when the backend hits XenbusStateInitWait. I'm not sure if that
is a problem -- it appears to have been done this way since forever
(even back in the classic Xen kernels) and I've never noticed a gARP go
missing in the way you describe, but perhaps something isn't quite
matching up any more.


> - Something else.
> Sorry for the naivety / verbiage.
> Thanks,
> lacos
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.