WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure

To: "Kay, Allen M" <allen.m.kay@xxxxxxxxx>
Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Fri, 11 Feb 2011 09:06:38 -0800
Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Delivery-date: Fri, 11 Feb 2011 09:16:46 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <987664A83D2D224EAE907B061CE93D53019D230984@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <987664A83D2D224EAE907B061CE93D530194305BEA@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20110125201008.GA18756@xxxxxxxxxxxx> <987664A83D2D224EAE907B061CE93D53019434A43C@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20110126161400.GA3515@xxxxxxxxxxxx> <987664A83D2D224EAE907B061CE93D53019434A8F7@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20110126212850.GB3578@xxxxxxxxxxxx> <987664A83D2D224EAE907B061CE93D53019438ECB3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <alpine.DEB.2.00.1101271156310.7277@kaball-desktop> <987664A83D2D224EAE907B061CE93D53019438F211@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20110128152843.GB29440@xxxxxxxxxxxx> <20110128154754.GA24075@xxxxxxxxxxxx> <987664A83D2D224EAE907B061CE93D53019D2308A0@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4D54A547.9060201@xxxxxxxx> <987664A83D2D224EAE907B061CE93D53019D230984@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Thunderbird/3.1.7
On 02/10/2011 07:07 PM, Kay, Allen M wrote:
>> That "extra memory" stuff is reserving some physical address space for
>> ballooning.  It should be completely unused (and unbacked by any pages)
>> until the balloon driver populates it; it is reserved memory in the
>> meantime.
> On my system, the entire chunk is marked as usable memory:
>
>     0000000100000000 - 000000023a6f4000 (usable)
>
> When you said it is reserved memory, are you saying it should be marked as 
> "reserved" or is there somewhere else in the code that keeps track of which 
> portion of this e820 chunk is back by real memory and which chunk is "extra 
> memory"?

Yes, it is marked as usable in the E820 so that the kernel will allocate
page structures for it.  But then the extra part is reserved with
memblock_x86_reserve_range(), which should prevent the kernel from ever
trying to use that memory (ie, it will never get added to the pools of
memory the allocator allocates from).  The balloon driver backs these
pseudo-physical pageframes with real memory pages, and then releases
into the pool for allocation.

    J

> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx] 
> Sent: Thursday, February 10, 2011 6:56 PM
> To: Kay, Allen M
> Cc: Konrad Rzeszutek Wilk; Stefano Stabellini; xen-devel; Keir Fraser
> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>
> On 02/10/2011 05:03 PM, Kay, Allen M wrote:
>> Konrad/Stefano,
>>
>> Getting back to the xen/dom0 boot failure on my Sandybridge SDP I reported a 
>> few weeks ago.
>>
>> I finally got around to narrow down the problem the call to 
>> xen_add_extra_mem() in arch/x86/xen/setup.c/xen_memory_setup().  This call 
>> increase the top of E820 memory in dom0 beyond what is actually available.
>>
>> Before xen_add_extra_mem() is called, the last entry of dom0 e820 table is:
>>
>>     0000000100000000 - 000000016b45a000 (usable)
>>
>> After xen_add_extra_mem() is called, the last entry of dom0 e820 table 
>> becomes:
>>
>>     0000000100000000 - 000000023a6f4000 (usable)
>>
>> This pushes the top of RAM beyond what was reported by Xen's e820 table, 
>> which is:
>>
>> (XEN)  0000000100000000 - 00000001de600000 (usable)
>>
>> AFAICT, the failure is caused by dom0 accessing non-existent physical 
>> memory.  The failure went away after I removed the call to 
>> xen_add_extra_mem().
> That "extra memory" stuff is reserving some physical address space for
> ballooning.  It should be completely unused (and unbacked by any pages)
> until the balloon driver populates it; it is reserved memory in the
> meantime.
>
> How is that memory getting referenced in your case?
>
>> Another potential problem I noticed with e820 processing is that there is a 
>> discrepancy between how Xen processes e820 and how dom0 does it.  In Xen 
>> (arch/x86/setup.c/start_xen()), e820 entries are aligned on 
>> L2_PAGETABLE_SHIFT boundary while dom0 e820 code does not.  As a result, one 
>> of my e820 entry that is 1 page in size got dropped by Xen but got picked up 
>> in dom0.  This does not cause problem in my case but the inconsistency on 
>> how memory is used by xen and dom0 can potentially be a problem.
> I don't think that matters.  Xen can choose not to use non-2M aligned
> pieces of memory if it wants, but that doesn't really affect the dom0
> kernel's use of the host E820, because dom0 is only looking for possible
> device memory, rather than RAM.
>
>     J
>> Allen
>>
>> -----Original Message-----
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] 
>> Sent: Friday, January 28, 2011 7:48 AM
>> To: Kay, Allen M
>> Cc: xen-devel; Stefano Stabellini
>> Subject: Re: [Xen-devel] 2.6.32.27 dom0 + latest xen staging boot failure
>>
>> On Fri, Jan 28, 2011 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jan 27, 2011 at 10:51:42AM -0800, Kay, Allen M wrote:
>>>> Following are the brief error messages from the serial console log.  I 
>>>> have also attached the full serial console log and dom0 system map.
>>>>
>>>> (XEN) mm.c:802:d0 Bad L1 flags 400000
>>> On a second look, this is a different issue than I had encountered.
>>>
>>> The 400000 translates to Xen thinking you had PAGE_GNTTAB set, but that
>>> is not right. Googling for this shows that I had fixed this with a
>>> Xorg server at some point, but I can't remember the details so that is not
>>> that useful :-(
>>>
>>> You said it works if you give the domain 1024MB, but I wonder if
>>> it also works if you disable the IOMMU? What happens then?
>> Can you also patch your Xen hypervisor with this patch? It will print out the
>> other 89 entries so we can see what type of values they have.. You might 
>> need to
>> move it a bit as this is for xen-unstable.
>>
>> diff -r 003acf02d416 xen/arch/x86/mm.c
>> --- a/xen/arch/x86/mm.c      Thu Jan 20 17:04:06 2011 +0000
>> +++ b/xen/arch/x86/mm.c      Fri Jan 28 10:46:13 2011 -0500
>> @@ -1201,11 +1201,12 @@
>>      return 0;
>>  
>>   fail:
>> -    MEM_LOG("Failure in alloc_l1_table: entry %d", i);
>> +    MEM_LOG("Failure in alloc_l1_table: entry %d of L1 (mfn: %lx). Other L1 
>> values:", i, pfn);
>>      while ( i-- > 0 )
>> -        if ( is_guest_l1_slot(i) )
>> +        if ( is_guest_l1_slot(i) ) {
>> +            MEM_LOG("L1[%d] = %lx", i, (unsigned 
>> long)l1e_get_intpte(pl1e[i]));
>>              put_page_from_l1e(pl1e[i], d);
>> -
>> +    }
>>      unmap_domain_page(pl1e);
>>      return -EINVAL;
>>  }
>>
>>>> (XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 90
>>>> (XEN) mm.c:2142:d0 Error while validating mfn 1d7e97 (pfn 3d69) for type 
>>>> 1000000
>>>> 000000000: caf=8000000000000003 taf=1000000000000001
>>>> (XEN) mm.c:2965:d0 Error while pinning mfn 1d7e97
>>>> (XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 
>>>> [ec=0000
>>>> ]
>>>> (XEN) domain_crash_sync called from entry.S
>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel