[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Live Migration Error

Hi Ian,
I got a fresh code image this morning. Live migration works fine, even after un-tweaking the timer back to its default value. I have tested, not necessarily thoroughly, but I haven't run into trouble yet. I guess this closes this chapter. For whatever it may be worth, I have some comments regarding the "previous" (Friday May 13) xfrd version: - Even though timeout increase would allow live migration to complete succesfully this was not always the case; there was actually a 50% chance of success. - On all successful migrations, the number of skipped pages after the last iteration and before domain suspend was always zero:

Saving memory pages: iter 3   0%
3: sent 0, skipped 0,
3: sent 0, skipped 0, [DEBUG] Conn_sxpr>
(AndresNfsDomain 8)[DEBUG] Conn_sxpr< err=0
[1116255361.997192] SUSPEND flags 00020004 shinfo 00000beb eip c01068fe esi 0002de60

- On all failed migrations, there was a nonzero number of said skipped pages (sometimes 12, sometimes 4)

Hope this somehow helps.
Keep up the excellent work

Ian Pratt wrote:

Teemu saves the day!!!
I actually set the timeout to 100 for no particular reason (originally it was 10, 20 didn't work either) Thanks Ian for your suggestion as well

I'd be really surprised if increasing the timeout actually made a difference. 
Are you sure you're not just using the shadow mode fix that was checked in a 
couple of hours ago?


At 02:45 PM 5/13/2005, Teemu Koponen wrote:
On May 13, 2005, at 20:07, Andres Lagar Cavilla wrote:


I try to do a live migration in the same physical host, i.e. xm migrate --live 'whatever' localhost It fails with 'Error: errors: suspend, failed, Callbak timed out'. It seems like transfer of memory pages works until the
point when the
domain needs to be suspended to do the final transfer.
Funny thing is
it used to work before, gloriously, and I haven't made any software/hardware changes. At some point a xm save command
failed with
timeout, and from there on live migration fails with this message. Non-live migration works perfectly, also between different physical hosts. save/restore also works flawlessly.
I had similar timeout errors previously, when I was using a
bit slower
servers. I overcame the problem by slightly increasing the timeout value in controller.py. It seemed to provide a remedy.



Xen-devel mailing list

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.