[Xen-devel] lost gARP after live migration


with reference to RHBZ#713585:

It seems when a RHEL-6.1 or F-15 Xen PV guest is live migrated, the gratuitous ARP packet is not forwarded to the affected "networking equipment". The netback vif is added to a routed bridge in the host(s) and external hosts are expeted to have connection to the guest at all times, no matter the current Xen host.

I experimented a bit with tcpdump, and the gARP does appear on the netfront interface. It also appears on the host bridge if sufficient time passes between completing the xenbus handshake and sending the gARP.

When the guest queues eg. three gARPs in rapid succession, a variable number of them gets lost. (When all such packets disappear, then the migrated guest becomes invisible to the outside world, until it initiates network traffic on its own.)

When the guest waits for about half a second before sending (queueing), the very first gARP packet successfully appears on the host bridge.

I suspect it's a timing race against the netback vif being added to the host bridge. What would be a good countermeasure?

- Adding two modparams to xen-netfront (gARP requeue count & number of msecs to wait between queueing the gARPs). - (Paolo's idea:) watching the "hotplug-status" xenstore node and sending a single gARP when the watch fires with "connected". This node belongs to the backend xenstore subtree, thus watching it from the guest doesn't please the architecture astronaut in me.
- Something else.

Sorry for the naivety / verbiage.


