[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT



When I put bug on code into if statement, the server can start.
Well, I should have committed another stupid mistakes during manually copy the patch, I apologize.
 
Anyway, I have one server run with patch one, where the patch is move into if statement, I shall get
the page address, and other information if it panic.
 
Meanwhile,  I'll have another server to run the second patch.
I'll keep u updated, thanks.

 
> Date: Wed, 1 Sep 2010 10:58:54 +0100
> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> From: keir.fraser@xxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx; jbeulich@xxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx
>
> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent
> merging across node boundaries. Nonetheless the code is simpler and more
> obvious if we put a further merging constraint in free_heap_pages() instead.
> It's also correcter, since I'm not sure that the
> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won't possibly BUG out
> if pg-1 is not a RAM page and is not in a known NUMA node range.
>
> Please give the attached patch a spin. (You should revert the previous
> patch, of course).
>
> Thanks,
> Keir
>
> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
> > Well. It did crash on every startup.
> >
> > below is what I got.
> > ---------------------------------------------------
> > root (hd0,0)
> > Filesystem type is ext2fs, partition type 0x83
> > kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
> > dom0_max_
> > vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot
> > [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078,
> > entry=0x100000
> > ]
> > module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0
> > [Multiboot-module @ 0x39b000, 0x3214d0 bytes]
> >
> >
> > ? __ __ _ _
> > ___ ___
> > \ \/ /___ _ __ | || | / _ \ / _ \ *
> > \ // _ \ '_ \ | || |_| | | | | | | *
> > / \ __/ | | | |__ _| |_| | |_| | * *
> > /_/\_\___|_| |_| |_|(_)___(_)___/ **************************************
> > hich entry is highlighted.
> > (XEN) Xen version 4.0.0 (root@xxxxxxxxxxxxxxxxx) (gcc version 4.1.2 20080704
> > (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010
> > (XEN) Latest ChangeSet: unavailableto modify the kernel arguments
> > (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
> > dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax
> > noreboot
> > (XEN) Video information:
> > (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds.
> > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds
> > (XEN) EDID info not retrieved because no DDC retrieval method detected
> > (XEN) Disc information:
> > (XEN) Found 6 MBR signatures
> > (XEN) Found 6 EDD information structures
> > (XEN) Xen-e820 RAM map:
> > (XEN) 0000000000000000 - 000000000009a800 (usable)
> > (XEN) 000000000009a800 - 00000000000a0000 (reserved)
> > (XEN) 00000000000e4bb0 - 0000000000100000 (reserved)
> > (XEN) 0000000000100000 - 00000000bf790000 (usable)
> > (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data)
> > (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
> > (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved)
> > (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved)
> > (XEN) 00000000e0000000 - 00000000f0000000 (reserved)
> > (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> > (XEN) 00000000fff00000 - 0000000100000000 (reserved)
> > (XEN) 0000000100000000 - 0000000640000000 (usable)
> > (XEN) --------------849
> > (XEN) --------------849
> > (XEN) --------------849
> > (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
> > (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97)
> > (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97)
> > (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117)
> > (XEN) ACPI: FACS BF79E000, 0040
> > (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97)
> > (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97)
> > (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97)
> > (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1)
> > (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97)
> > (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117)
> > (XEN) --------------847
> > (XEN) ---------srat enter
> > (XEN) ---------prepare enter into pfn
> > (XEN) -------in pfn
> > (XEN) -------hole shift returned
> > (XEN) --------------849
> > (XEN) System RAM: 24542MB (25131224kB)
> > (XEN) Unknown interrupt (cr2=0000000000000000)
> > (XEN) 00000000000000ab 0000000000000000 ffff82f600004020
> > 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000
> > 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008
> > 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000
> > ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18
> > 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000
> > 0000000000000163 0000000900000000 00000000000000ab 0000000000000201
> > 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff
> > 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020
> > 0000000000001000 0000000000000004 0000000000000080 0000000000000001
> > ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000
> > 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc
> > 0000000000540000 00000000005fde36 0000000000540000 0000000000100000
> > 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630
> > 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0
> > 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000
> > 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000
> > 0000000800000000 000000010000006e 0000000000000003 00000000000002f8
> > 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 00000000fffff000
> >
> >> Date: Wed, 1 Sep 2010 09:49:18 +0100
> >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> >> From: keir.fraser@xxxxxxxxxxxxx
> >> To: JBeulich@xxxxxxxxxx
> >> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >>
> >> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
> >>
> >>>> Well I agree with your logic anyway. So I don't see that this can be the
> >>>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped as
> >>>> to
> >>>> why the page arithmetic and checks in free_heap_pages are (apparently)
> >>>> resulting in a page pointer way outside the frame-table region and actually
> >>>> in the directmap region.
> >>>
> >>> There must be some unchecked use of PAGE_LIST_NULL, i.e.
> >>> running off a list end without taking notice (0xffff8315ffffffe4
> >>> exactly corresponds with that).
> >>
> >> Okay, my next guess then is that we are deleting a chunk from the wrong list
> >> head. I don't see any check that the adjacent chunks we are considering to
> >> merge are from the same node and zone. I suppose the zone logic does just
> >> work as we're dealing with 2**x aligned and sized regions. But, shouldn't
> >> the merging logic in free_heap_pages be checking that the merging candidate
> >> is from the same NUMA node? I see I have an ASSERTion later in the same
> >> function, but it's too weak and wishful I suspect.
> >>
> >> MaoXiaoyun: can you please test with the attached patch? If I'm right, you
> >> will crash on one of the BUG_ON checks that I added, rather than crashing on
> >> a pointer dereference. You may even crash during boot. Anyhow, what is
> >> interesting is whether this patch always makes you crash on BUG_ON before
> >> you would normally crash on pointer dereference. If so this is trivial to
> >> fix.
> >>
> >> Thanks,
> >> Keir
> >>
> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.