WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To: <keir.fraser@xxxxxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Date: Wed, 1 Sep 2010 11:17:43 +0800
Cc:
Delivery-date: Tue, 31 Aug 2010 20:18:46 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
Importance: Normal
In-reply-to: <C8A2D509.218B6%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <BAY121-W50277AB5A3C8DF48290B8DDA8A0@xxxxxxx>, <C8A2D509.218B6%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thank you the details.
 
There is no "PFN compression on bits" on Xen boot output. I add some extra log, and
found it returned from xen/arch/x86/x86_64/mm.c, line 183. Please refer to the boot
log below.
 
I may can add some assertions on the pages address after chunk merging.
Thank you for mails your forwarded. I will go through all of them later.
 
--------------------------pfn_pdx_hole_setup-----------------
164 void __init pfn_pdx_hole_setup(unsigned long mask)
 165 {
 166     unsigned int i, j, bottom_shift, hole_shift;
 167     printk("-------in pfn\n");
 168
 169     for ( hole_shift = bottom_shift = j = 0; ; )
 170     {
 171         i = find_next_zero_bit(&mask, BITS_PER_LONG, j);
 172         j = find_next_bit(&mask, BITS_PER_LONG, i);
 173         if ( j >= BITS_PER_LONG )
 174             break;
 175         if ( j - i > hole_shift )
 176         {
 177           &nb sp; hole_shift = j - i;
 178             bottom_shift = i;
 179         }
 180     }
 181     if ( !hole_shift ){
 182         printk("-------hole shift returned\n");
 183         return;
 184     }
 185     printk("-------in pfn middle \n");
 186
 187     printk(KERN_INFO "PFN compression on bits %u...%u\n",
 188            bottom_shift, bottom_shift + hole_shift - 1);
 189     printk("----PFN compression on bits %u...%u\n",
 190            bottom_shift, bottom_shift + hole_shift - 1);
 191
 192     pfn_pdx_hole_shift  = hole_shift;
 193     pfn_pdx_bottom_mask = (1UL << bottom_shift) - 1;
 194     ma_va_bottom_mask   = (PAGE_SIZE << bottom_shift) - 1;
 195     pfn_hole_mask       = ((1UL << hole_shift) - 1) << bottom_shift;
 196     pfn_top_mask        = ~(pfn_pdx_bottom_mask | pfn_hole_mask);
 197     ma_top_mask         = pfn_top_mask << PAGE_SHIFT;
 198 }
 
------------------------------------------xen boot log---------------------
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009a800 (usable)
(XEN)  000000000009a800 - 00000000000a0000 (reserved)
(XEN)  00000000000e4bb0 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000bf790000 (usable)
(XEN)  00000000bf790000 - 00000000bf79e000 (ACPI data)
(XEN)  00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
(XEN)  00000000bf7d0000 - 00000000bf7e0000 (reserved)
(XEN)  00000000bf7ec000 - 00000000c0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000fff00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000640000000 (usable)
(XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
(XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT       97)
(XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT&nb sp;      97)
(XEN) ACPI: DSDT BF7904B0, 4D6A (r2  CTSAV CTSAV122      122 INTL 20051117)
(XEN) ACPI: FACS BF79E000, 0040
(XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT       97)
(XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG  20091123 MSFT       97)
(XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT       97)
(XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT         1 INTL        1)
(XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET  20091123 MSFT       97)
(XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm    CpuPm       12 INTL 20051117)
(XEN) --------------844
(XEN) ---------srat enter
(XEN) ---------prepare en ter into pfn
(XEN) -------in pfn
(XEN) -------hole shift returned
(XEN) --------------849
(XEN) System RAM: 24542MB (25131224kB)
(XEN) Domain heap initialised DMA width 31 bits
 
 
> Date: Tue, 31 Aug 2010 15:49:29 +0100
> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> From: keir.fraser@xxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> CC: JBeulich@xxxxxxxxxx
>
> Do you have a line in Xen boot output that starts "PFN compression on bits"?
> If so what does it say?
>
> My suspicion is that Jan Beulich's patches to implement a consolidated page
> array for sparse memory maps has broken the assumption in some Xen code
> that:
> page_to_mfn(mfn_to_page(x)+y) == x+y, for all valid mfns x, and all y up to
> some pretty big limit.
>
> Looking in free_heap_pages() I see we do a whole bunch of chunk merging in
> our buddy allocator, doing arithmetic on variable 'pg' to find neigbouring
> chunks. It's a bit dodgy I suspect.
>
> I'm cc'ing Jan to see what we can get away with in doing arithmet ic on
> page_info pointers. What's the guaranteed smallest aligned contiguous ranges
> of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent
> page_info structs relate to adjacent MFNs)
>
> If this is the problem I'm pretty sure we can come up with a patch quite
> easily, but depending on the answer to my above question to Jan, we may need
> to do some code auditing.
>
> -- Keir
>
> On 31/08/2010 14:49, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
> > Hi Keir:
> >
> > Thank you for correcting my mistakes.
> > Here is the lastest panic and its objdump.
> > I am not familiar with assemble language and those regigsters usage.
> > I will try to spend some other time to get more understandings.
> > What's your opionion?
> > btw, the memtest is still running, so far so good, thanks.
> >
> > ---- --------------objdump-----------------------------------------------------
> > -------------------
> > 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx
> > 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx
> > 179 }
> > 180 static inline void
> > 181 page_list_del(struct page_info *page, struct page_list_head *head)
> > 182 {
> > 183 struct page_info *next = pdx_to_page(page->list.next);
> > 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax
> > 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax
> > 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187
> > ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx
> > 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545
> > <free_heap_pages+0x405>
> > 189 struct page_info *prev = pdx_to_page(page->list.prev);
> > 1 90 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx
> > 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx
> > 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx
> > 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp &nbs p; 0x8(%rcx),%rbx
> > 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580
> > <free_heap_pages+0x440>
> > 195
> > 196 if ( !__page_list_del_head(page, head, next, prev) )
> > 197 {
> > 198 next->list.prev = page->list.prev;
> > 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax)
> > 200 prev->list.next = page->list.next;
> > 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax
> > &nbs p;
> > 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx)
> > 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13
> > 204 ffff82c4801153cd:<++41 83 c4 01 & lt;++add $0x1,%r12d
> > 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp ; $0x12,%r12d
> > 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be
> > <free_heap_pages+0x37e>
> > 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp
> > 208 ffff82c4801153e2:<++7d 00 00
> > 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx
> > 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi
> >
> >
> > ------------------------------------------------------------------------------
> > ---------------------
> > blktap_sysfs_create: adding attributes for dev ffff880239496c00
> > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]----
> > (XEN) CPU: 2
> > (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor
> > (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20
> > (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802
> > (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000
> > (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282
> > (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2
> > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0
> > (XEN) cr3: 000000008da54000 cr2: ffff83 15ffffffe4
> > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008
> > (XEN) Xen stack trace from rsp=ffff83023ff37cb8:
> > (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000
> > (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000
> > (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060
> > (XEN) ffff83060a 3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8
> > (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8
> > (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018
> > (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009
> > (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000
> > (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0
> > (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce
> > (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096
> > (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031
> > (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30
> > (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4
> > (XEN) 000000004523af44 0000000000000000 000000004 523b158 0000000000000000
> > (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8
> > (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100
> > (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000
> > (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 000000 0000000009
> > (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
> > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
> > (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530
> > (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280
> > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0
> > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000
> > (XEN) [<ffff82c4801447da> ] __find_next_bit+0x6a/0x70
> > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
> > (XEN)
> > (XEN) Pagetable walk from ffff8315ffffffe4:
> > (XEN) L4[0x106] = 00000000bf569027 5555555555555555
> > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
> > (XE N)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 2:
> > (XEN) FATAL PAGE FAULT
> > (XEN) [error_code=0002]
> > (XEN) Faulting linear address: ffff8315ffffffe4
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> >
> > ------------------------------------------------------------------------------
> > ---------------------
> >> Date: Mon, 30 Aug 2010 14:16:09 +0100
> >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> >> From: keir.fraser@xxxxxxxxxxxxx
> >> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >>
> >> On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> >>
> >>> Appreciate for the quick response.
> >>>
> >>> Actually I have done some decode on the backtrace last Friday.
> >>> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms"
> >>> (please see below). It looks like the bug happened on the domain page list
> >>
> >> ffff82c4801153c3 isn't the start of an instruction in your below
> >> disassembly. Hence you didn't disassemble exactly the build of Xen which
> >> crashed. It needs to be exactly the same image.
> >>
> >> -- keir
> >>
> >> & gt; travels, which is beyond my understanding. Since in my understandi ng,
> >>> those domain pages come from kernel memory zone, they are always
> >>> reside in the physical memory, and the address shouldn't have the chance
> >>> to be changed, right?
> >>> If so, what is the relationship between all those panic and free_heap_pages?
> >>>
> >>> Several servers (at least 3) experienced the same panic on the same test.
> >>> Those servers have the identical hardware, kernel and xen configuration.
> >>> Right now, on one server, memtest is running, shall be finished in a few
> >>> hours.
> >>> (24G memory)
> >>>
> >>> ----------------------------------------------------------------------------
> >>> --
> >>> ------
> >>> 169 static inline void
> >>> 170 page_list_del(struct page_info *page, struct page_list_he ad *head)
> >>> 171 {
> >>> 172 struct page_info *next = p dx_to_page(page->list.next);
> >>> 173 struct page_info *prev = pdx_to_page(page->list.prev);
> >>> 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi
> >>> 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx
> >>> 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax
> >>> # ffff82c4803764c0 <_heap>
> >>> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx
> >>> 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx
> >>> 179 }
> >>> 180 static inline void
> >>> 181 page_list_del(struct page_info *page, struct page_list_head *head)
> >>> 182 {
> >>> 183 struct page_info *next = pdx_to_page(page->list.next);
> >>> 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax
> >>> 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax
> >>> 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%r ax
> >>> 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx
> >>> 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575
> >>> <free_heap_pages+0x405>
> >>> 189 struct page_info *prev = pdx_to_page(page->list.prev);
> >>> 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx
> >>> 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx
> >>> 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx
> >>> 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx
> >>> 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0
> >>> <free_heap_pages+0x440>
> >& gt;> 195
> >>> 196 if ( !__page_list_del_head(page, head, next, prev) )
> >>> 197 {
> >>> 198
> >>> ----------------------------------------------------------------------------
> >>> --
> >>> ------
> >>>
> >>>> Date: Mon, 30 Aug 2010 10:02:05 +01 00
> >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> >>>> From: keir.fraser@xxxxxxxxxxxxx
> >>>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>>
> >>>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> >>>>
> >>>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is
> >>>>> not a valid page address.
> >>>>> I printted pages of the domain in assign_pages, wh ich all looks like
> >>>>> ffff82f60bd64000, at least
> >>>>> ffff82f60 is the same.
> >>>>
> >>>> Yes, well you may not be crashing on a supposed page address. Certainly the
> >>>> page pointer that relinquish_memory() is working on, and passed to
> >>>> put_page->free_domheap_pages is valid enough to not cause any of those
> >>>> functions to crash when dereferenci ng it. At the moment you really have no
> >>>> idea what is causing free_heap_pages() to crash.
> >>>>
> >>>>> A bit of lost direction to go further. Thanks.
> >>>>
> >>>> You need to find out which line of code in free_heap_pages() is crashing,
> >>>> and what variable it is trying to dereference when it crashes. You have a
> >>>> nice backtrace with an EIP value, so you can 'objdump -d xen-syms' and
> >>>> search for the EIP in the disassembly. If you have a debug build of Xen you
> >>>> can even do 'objdump -S xen-syms' and have the disassembly annotated with
> >>>> corresponding source lines.
> >>>>
> >>>> Have you seen this on more than one physical machine? If not, have you run
> >>>> memtest on the offending machine?
> >>>>
> >>>> -- Keir
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel