[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen: mm.c MFN errors



On Fri, Feb 25, 2011 at 1:48 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Thu, 2011-02-24 at 22:01 +0000, Shriram Rajagopalan wrote:
>> I had this problem in 4.0.1 (still not resolved) and it persists in
>> 4.1.0-rc6-pre.
>>  And I am not the only one facing this issue apparently.
>> http://lists.xensource.com/archives/html/xen-users/2011-02/msg00362.html
>>  also reports the same issue, on xen 4.0.2-rc2
>>
>> My workload was simple 2.6.18 domU (512M) with just 2 threads constantly
>> mallocing, touching and freeing memory.
>>
>> I enabled remus on the domain (just memory replication) which basically
>> exercises xc_domain_save/xc_domain_restore paths.
>
> What is your dom0 kernel version?
>
> Does a basic save/restore or live migrate work in the same
> configuration? e.g. "xm save"+"xm restore" or "xm migrate --live".
>
>> Issue 1:
>>  On primary during replication, xm dmesg logs are flooded with messages like
>> ........
>> (XEN) mm.c:889:d0 Error getting mfn 468900 (pfn 1fdd1) from L1 entry
>> 8000000468900625 for l1e_owner=0, pg_owner=17
>
> Dom0 failing to map a page owned by dom17? I'm not sure why that might
> happen. I'd hazard that DOMID_SELF was getting used somewhere that the
> foreign domain id was required, perhaps due to using an incorrect
> interface (e.g. perhaps emulated pt updates instead of a hypercall).
>
> My first instinct would be to dig into the MMAP_BATCH ioctl interfaces
> in the privcmd driver.
>
>> Issue 2:
>>  VM fails to recover on secondary when I destroy it on primary. xm
>> dmesg on secondary again shows issues wrt pagetable pinning
> [...]
>
> If dom0 is failing to map guest pages on the source domain then I think
> all bets are off wrt getting something sane on the other end.
>
> Ian.
>
>> I wager this has got something to do either with the
>> canonicalization/uncanonicalization code but cannot pin point
>> where exactly, atm.
>> shriram
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>
>
>

nailed it. my dom0 is 2.6.27. A closer analysis revealed that the
primary side MFN mapping errors occurred only during xm destroy, and followed by
"ignoring paging op on dying domain..".

But the actual bug is in scanning the pagebuf, in xc_domain_restore.
Simple array offset issue. Will send out a patch.

shriram

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.