xen-users
Re: [Xen-users] Live migration: 2500ms downtime
Hi again,
On 8/10/07, mail4dla@xxxxxxxxxxxxxx <mail4dla@xxxxxxxxxxxxxx
> wrote:Hi,
On 8/10/07,
Marconi Rivello <marconirivello@xxxxxxxxx> wrote:
I did ping from a third physical machine. The result doesn't vary much.
I followed your advice on analyzing the traffic. But I don't see why to look for ICMPs, since the DomU does answer the ping, it just has a
2.5s gap after stopping on one machine and starting on the other. Well, I asked you to this (in your original test setup where the ping was performed from the source Dom0) in order to see whether the packets are actually sent out of the machine, or the Dom0 tries to send it through the bridge and the no-longer existing virtual interface.
Oh, I get it. Sorry for not making it clear. You know, I gotta give out enough info to let people be able to help, and not too much to make it a too long email and drive people away. :)
Here follows 2 scenarios:
That happens when the physical machines are connected to the switch. I started tcpdump on both Dom0's to see if the DomU would send the unsolicited arp reply to update the switch's tables. And there is none. So, unless there is already traffic going out from the domU, there isn't anything to tell the switch the machine changed from one port to another.
This is excactly the anticipated behaviour. AFAIR, it is a known issue and more recent builds of Xen do send the unsolicited arp reply after migration. With a switch, you are quite lucky to have only
2.5s outtime. Depending on the switch and it's ageing algorithm, this can be significantly higher, i.e., 30s or so.
Do you also have the 2.5s outtime when pinging from a 3rd machine and having the machines connected by a hub (actually, it's sufficient to connect the Xen machines via a hub to the same switch port)?
One test I tried was exactly connecting the Xen machines to a hub (a 10mbps though - the only available) and the hub to the switch. Because of that, I set the cross connection on the second port, so the migration could be done quickly. The NICs and switch are 10/100/1000.
The average 2 to 3 seconds downtime still occur with the hub.
Another issue that I described on a previous email (which, unfortunately, didn't get any replies) is that this downtime increases to more than 20 seconds if I set the domU's memory to 512MB (the maxmem set is 1024MB). I repeated the test successively, from one side to the other, with mem set to 512 and 1024, and the result was always the same. Around 3s with mem = maxmem, and around 24s with mem=512 and maxmem=1024.
Just to make it clear: the domU is running only an apache server. The cpu, mem, and net loads are really low and shouldn't be interfering.
Just to emphasize: I'm running CentOS 5, with Xen 3.0.3 (which comes with it), and applied the Xen related official CentOS (same as redhat's) updates.
My experiences are based on the Xen that is shipped with Ubuntu 7.04, which is also a 3.0.3 and when the machines are connected through a hub, the unavailability period is within the same orders as in the paper that you quoted in your first email.
hth dla
I'm starting to consider that it might be a problem with this distribution specifically, although I really don't see why it should be.
Thanks for the help. Still, any other suggestions or insights are most welcome.
Marconi.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
|
|