On Mon, Sep 6, 2010 at 10:56 PM, Shriram Rajagopalan <
rshriram@xxxxxxxxx> wrote:
>
> Hardware: Dell Poweredge R510 (32G ram, 8 CPU- Xeon)
>
> 64bit - xen 4.0.1 stable
>
> 64bit - 2.6.32.18 dom0 (.config attached) running Ubuntu 10.04
> 32 bit - 2.6.18.8 domU (.config attached) running ubuntu 8.04
>
> domU has 3 tap2 disks, on lvm snapshots.
> domU has 2G mem, 2 VCPU
>
> workload on domU - ssh + top running, destroy domain -- This works .
>
> But, If i run a heavier workload say postgres db (just starting db, no
> queries), remus fails to recover. Note that this is not spurious timeout
> error.
> On destroying the vm on primary, the backup fails to recover the vm with
> the following error in xm dmesg:
>
> (XEN) mm.c:779:d0 Bad L1 flags 98
> (XEN) mm.c:1186:d0 Failure in alloc_l1_table: entry 1
> (XEN) mm.c:2117:d0 Error while validating mfn 4101af (pfn 2cc08) for type
> 1000000000000000: caf=8000000000000003 taf=1000000000000001
> (XEN) mm.c:868:d0 Attempt to create linear p.t. with write perms
> (XEN) mm.c:1330:d0 Failure in alloc_l2_table: entry 113
> (XEN) mm.c:2117:d0 Error while validating mfn 40fc4c (pfn 2d1ce) for type
> 2000000000000000: caf=8000000000000003 taf=2000000000000001
> (XEN) mm.c:1440:d0 Failure in alloc_l3_table: entry 2
> (XEN) mm.c:2117:d0 Error while validating mfn 40fcdf (pfn 2d08d) for type
> 3000000000000000: caf=8000000000000003 taf=3000000000000001
> (XEN) mm.c:2733:d0 Error while pinning mfn 40fcdf
> ============
>
> Error in xend.log @ backup
> -----------------------------
> [2010-09-06 21:38:16 2392] DEBUG (XendDomainInfo:1804) Storing domain
> details: {'image/entry': '3222274048', 'console/port': '2', 'image/loader':
> 'generic',
> 'vm': '/vm/7be5f9bf-da53-6c10-d4e5-330940210966',
> 'control/platform-feature-multiprocessor-suspend': '1',
> 'image/hv-start-low': '4118806528', 'image/guest-os
> ': 'linux', 'cpu/1/availability': 'online',
> 'image/features/writable-descriptor-tables': '1', 'image/virt-base':
> '3221225472', 'memory/target': '2048000', 'i
> mage/guest-version': '2.6', 'image/features/supervisor-mode-kernel': '1',
> 'image/pae-mode': 'yes', 'description': '', 'console/limit': '1048576',
> 'image/padd
> r-offset': '3221225472', 'image/hypercall-page': '3222278144',
> 'image/suspend-cancel': '1', 'cpu/0/availability': 'online',
> 'image/features/pae-pgdir-above-4
> gb': '1', 'image/features/writable-page-tables': '1', 'console/type':
> 'xenconsoled', 'image/features/auto-translated-physmap': '1', 'name':
> 'tpccExpt-remus',
> 'domid': '6', 'image/xen-version': 'xen-3.0', 'store/port': '1'}
> [2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
> _static_max=0x7d000000, _static_min=0x0,
> [2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
> /usr/lib/xen/bin/xc_restore 4 6 1 2 0 0 0 0
> [2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) xc_domain_restore
> start: p2m_size = 7d000
> [2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) Reloading memory
> pages: 0%
> [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> Error when reading batch size
> [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> error when buffering batch, finishing
> [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423)
> [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> Failed to pin batch of 18 page tables (22 = Invalid argument)
> [2010-09-06 21:40:25 2392] INFO (XendCheckpoint:423) Restore exit with rc=1
>
> the number of page tables falling under the error category also varies
> (16,18,20)...
> =============