[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!


  • To: Christophe Saout <christophe@xxxxxxxx>
  • From: Teck Choon Giam <giamteckchoon@xxxxxxxxx>
  • Date: Wed, 5 Jan 2011 03:32:58 +0800
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • Delivery-date: Tue, 04 Jan 2011 11:34:28 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=MpHCIN3pIsb5gcqEI/AQ/9Lt4OENDvkxR1+aHfPdghbrQmVxOqU5dTlm2fk/cBgQth 1kpRtobbsm2ywp93rli8atDxbspNG/+HGJvhtXntbFaLQqq25/q4whimLsCVIQdZjkyS JCZNmiN2ZjbpR5EYmKh8j2L+dV1qd8m1YU1OA=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>



On Wed, Jan 5, 2011 at 2:40 AM, Christophe Saout <christophe@xxxxxxxx> wrote:
Hi once more,


> > It doesn't look like this has been resolved yet.  Somewhere I saw a
> > request for the hypervisor message related to the pinning failure.
> >
> > Here it is:
> >
> > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f)
> > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f
> >
> > I have a bit of experience in debugging things, so if I can help someone
> > with more information...
>  [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60
>  [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0
>  [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10
>  [<ffffffff810decde>] __pte_alloc+0x7e/0xf0
>  [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930
>  [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100
>  [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380
>  [<ffffffff81452b96>] do_page_fault+0x116/0x3e0
>  [<ffffffff8144ff65>] page_fault+0x25/0x30

> Additional information: This happened with a number of commands now.
> However, I am running a multipath setup and every time the crash
> seemed to be caused in the process context of the multipath daemon.
> I think the daemon listens to events from the device-mapper subsystem
> to watch for changes and the problem somehow arises from there, since
> on another machine with the same XEN/Dom0 version without such a
> daemon I never had any troubles with LVM.

On further investigation is seems that most of the time the issue is not
caused by the daemon, but by the "multipath" tool, which is used a lot
by udev to identify properties of block devices.

When I start stracing udevd (following forks), I'm not able to reproduce
the crash anymore.  So I was hoping to find out what the process was
doing before the crash occurs, but since my attempts to trace the
process masks the bug, I can't. :(

(without strace, the bug is very common, about every third "lvcreate"
command.  Every lvcreate command triggers about 20 multipath
invocations)


I am able to prevent that bug for 8 days (till now) by implementing sleep 5 seconds then syc then sleep 5 seconds then sync repeating this for 60 seconds while doing lvm snapshot for 10 domUs.  I mean:

1. lvm snapshot domU (lvcreate)
2. mount lvm snapsho domUt
3. rsync to backup domU
4. umount lvm snapshot domU
5. remove lvm snapshot domU (lvremove)
6. sync (start countdown of 60 seconds and every 5 seconds interval doing sync)
7. sleep 5
8. sync
9. sleep 5
10. sync
11. sleep 5
12. sync
.... until it hits 0 second countdown
Then next domU repeat the cycle.

Doing the above I am able to prevent such crash or bug to pop up for 8 days (8 such daily LVM snapshot backup for all domUs) which I posted in this thread.

Thanks.

Kindest regards,
Giam Teck Choon
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.