[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] çåïlive-migration restore failed error



Hi all,

I have run another round of live migratin, and found that restore failed when assigning pages for guest.

hypervisor logs are as follows:
(XEN) page_alloc.c:1114:d0 Over-allocation for domain 1779: 132097 > 132096
(XEN) memory.c:149:d0 Could not allocate order=0 extent: id=1779 memflags=0 (319 of 320)
and matching xend logs are here:
[2014-09-17 13:43:44 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:Failed allocation for dom 1779: 320 extents of order 0
[2014-09-17 13:43:44 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:ERROR Internal error: Failed to allocate memory for batch.!
[2014-09-17 13:43:44 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:
[2014-09-17 13:43:45 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:Restore exit with rc=1
[2014-09-17 13:43:45 7256 1157011776] DEBUG (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 1779 3 4 1 1 1 0 failed status 256
[2014-09-17 13:43:45 7256 1157011776] DEBUG (XendDomainInfo:3846) XendDomainInfo.destroy: domid=1779

It seems that hypervisor is trying to populate too many pages(one more than domain max_pages), and thus domain restore failed. I even notice that as migration goes on, the total number of pages populated increases once every hundreds of migrations. And when the total number goes larger than max_pages(in our case, it is 132096), error occurs. As you might have noticed, our migration is based on xen-4.0.1, is this error an unknown issue? Or, is it fixed by patch 65c9792df60051b5f5eaadbc47a118cfba7edd49?

Still, when I printed the total number of guest domain(that is domain tot_pages) between two migrations, the result is supprisingly 132087 and nerver changes. But when this error happens, tot_pages exceeds max_pages. I don't know if this is all right. What is it that I am missing here?

Thanks,
Huaixin Chang
------------------------------------------------------------------
åääïåæé(äæ) <huaixin.chx@xxxxxxxxxxxxxxx>
åéæéï2014å9æ16æ(ææä) 00:15
æääïAndrew Cooper <andrew.cooper3@xxxxxxxxxx>ïkeir <keir@xxxxxxx>ïIan.Campbell <Ian.Campbell@xxxxxxxxxx>ïstefano.stabellini <stefano.stabellini@xxxxxxxxxxxxx>ïxen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>ïIan.Jackson <Ian.Jackson@xxxxxxxxxxxxx>ïgeorge.dunlap <george.dunlap@xxxxxxxxxxxxx>
æãéïååæ(åè) <jinsong.liu@xxxxxxxxxxxxxxx>
äãéïçåïlive-migration restore failed error


------------------------------------------------------------------
åääïAndrew Cooper <andrew.cooper3@xxxxxxxxxx>
åéæéï2014å9æ15æ(ææä) 22:01
æääïåæé(äæ) <huaixin.chx@xxxxxxxxxxxxxxx>ïkeir <keir@xxxxxxx>ïIan.Campbell <Ian.Campbell@xxxxxxxxxx>ïstefano.stabellini <stefano.stabellini@xxxxxxxxxxxxx>ïxen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>ïIan.Jackson <Ian.Jackson@xxxxxxxxxxxxx>ïgeorge.dunlap <george.dunlap@xxxxxxxxxxxxx>
æãéïååæ(åè) <jinsong.liu@xxxxxxxxxxxxxxx>
äãéïRe: live-migration restore failed error

On 15/09/2014 10:41, åæé(äæ) wrote:
We are working on live-migration based on Xen-4.0.1(For history reason, and meantime we are upgrading our Xen to very latest version). Restore failed when live migrating ubuntu12.04 on xen-4.0.1. To be more specific, error occurred when populating memory. Error messages are as follow:

[2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:307) [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0
[2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:428) Thread-40188
[2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:xc_domain_restore start: p2m_size = fefff
[2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Reloading memory pages:   0%
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Failed allocation for dom 2763: 128 extents of order 0
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:ERROR Internal error: Failed to allocate memory for batch.!
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Restore exit with rc=1
[2014-09-12 22:40:50 7331 1189091648] DEBUG (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0 failed status 256
[2014-09-12 22:40:50 7331 1189091648] DEBUG (XendDomainInfo:3845) XendDomainInfo.destroy: domid=2763

In this case, populate_physmap terminated with nr_done 127.  So xc_memory_op return 127 while nr_extents equals 128.

This problem happends once every 1770th live migration or so. As I am debugging this issue, I'm sending this email to ask for suggestions on this issue.

Thanks,
Huaixin Chang

Xen is unable to fulfil the allocation request.  You have run out of host memory.

~Andrew


Here are some more clues.

I'm migrating ubuntu12.04(with 1G or 512M memory) on two machines with around 96G of memory, back and forth. The issue occurs around 1770 times of migration erery time, whether guest memory is 512M or 1G.

In the pasted xend log, a request of 128 pages of non-contiguous memory failed. Currently, I am conducting another round of migration test, which has completed 230 times of migration, and hopefully will terminate after about one day. So far, I do not see a major decrese of hypervisor memory. I will check whether there is memory issues when the problem shows up.
total_memory           : 98276
free_memory            : 84454

Sorry for not being able to provide a hypervisor log at the moment. Previously I printed too many of messages, most of them were suppressed and no helpful message could be found. I will also check whether this round will help. 

Thanks,
Huaixin Chang

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.