[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0



On 27.02.2015 13:30, Juergen Gross wrote:
> On 02/27/2015 12:29 PM, Stefan Bader wrote:
>> On 05.02.2015 15:33, Stefan Bader wrote:
>>> While experimenting/testing various kernel versions I discovered that 
>>> trying to
>>> boot a Haswell based hosts will always crash when booting as Xen dom0
>>> (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen 
>>> with
>>> v3.19-rc7. A bare metal boot is having no issues and also an Opteron based 
>>> host
>>> is having no issues (dom0 and bare metal).
>>> Could be a table that the other host does not have and since its only 
>>> happening
>>> in dom0 maybe some cpu capability that needs to be passed on?
>>
>> I think I may have some more data here. I tried some patches which Juergen 
>> sent
>> me, but those were not changing much. I found that the problem is related on
>> that host to the use of dom0_mem= and may be a crash like below or a hang or
>> "weird state" in general.
>> When not using dom0_mem, I can boot with a 3.19 kernel, otherwise (trying 
>> 512M
>> and 1G) there is trouble. What is special about this host is that is has more
>> "holes" than the other machine I usually use.
>>
>> (XEN) Xen-e820 RAM map:
>> (XEN)  0000000000000000 - 000000000009a400 (usable)
>> (XEN)  000000000009a400 - 00000000000a0000 (reserved)
>> (XEN)  00000000000e0000 - 0000000000100000 (reserved)
>>         The first hole is common
>> (XEN)  0000000000100000 - 0000000030a48000 (usable)
>> (XEN)  0000000030a48000 - 0000000030a49000 (reserved)
>> (XEN)  0000000030a49000 - 00000000a27f4000 (usable)
>>         But then normally there is only one usable area up to
>>         around ACPI_NVS
>> (XEN)  00000000a27f4000 - 00000000a2ab4000 (reserved)
>> (XEN)  00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS)
>> (XEN)  00000000a2fb4000 - 00000000a2feb000 (ACPI data)
>> (XEN)  00000000a2feb000 - 00000000a3000000 (usable)
>> (XEN)  00000000a3000000 - 00000000afa00000 (reserved)
>> (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
>> (XEN)  00000000fec00000 - 00000000fec01000 (reserved)
>> (XEN)  00000000fed00000 - 00000000fed04000 (reserved)
>> (XEN)  00000000fed10000 - 00000000fed1a000 (reserved)
>> (XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
>> (XEN)  00000000fed84000 - 00000000fed85000 (reserved)
>> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
>> (XEN)  00000000ffc00000 - 0000000100000000 (reserved)
>> (XEN)  0000000100000000 - 000000024e600000 (usable)
>>         Also after ACPI data there is some usable, and then another
>>         hole (area) which is unuasual.
>>
>> So I added a bit more debug printk's: Here a boot with 
>> dom0_mem=512M:max=512M:
>>
>> [    0.000000] SMB: remap 154(0x9A)-256(0x100) -> 131072(0x20000)
>>                 ==> 0x09A000-0x100000 -> 0x20000000 (@512M+)
>>                 ==> 0x09A000-0x09A3FF was usable but partial
>>
>> The first hole is supposed to be remapped as it is below the 512M which are 
>> in
>> the initial MFN list. I suppose this works but Juergen, I really would love 
>> to
>> understand how and I am not sure I grasp things. To me it looks like the 
>> remap
>> info is stored in the memory area to be mapped... which is reserved(?!)
> 
> :-)
> 
> We can remap only memory which is currently not in use, otherwise
> the information in that memory area couldn't be found again. So we
> are free to store the remap info in this memory, relieving us from
> the pain to find some memory where to store it without having enough
> of the memory management set up already.

Argh, no, I just realized the fatal mistake in my whole imaginary model. For
some stupid reason the initial MFN table there is a 1-1 mapping of the real
memory. Which is complete non-sense and does not really help in getting what
really is going on. Of course the whole purpose is to convert this into
something that *does* look like the E820 setup provided. Bah, too many trees 
here...

> 
>> I think the problem comes from these other holes (which are beyond 512M). 
>> When
>> not using dom0_mem those are remapped (like the first one), while with the 
>> clamp
>> they supposedly should be identity mapped...
> 
> Indeed.
> 
>>
>> [    0.000000] SMB: prange id 199240(0x30A48) - 199241(0x30A49)
>>                 ==> 0x30A48000(~778M)
>> [    0.000000] SMB: prange id 665588(0xA27F4) - 667627(0xA2FEB)
>>                 ==> 0xA27F4000(~2599M)
>> [    0.000000] SMB: prange id 667648(0xA3000) - 1048576(0x100000)
>>                 ==> 0xA3000000(~2608M)-0x100000000(=4G) id mapped
>> [    0.000000] Released 0 page(s)
>> [    0.000000] Remapped 102 page(s)
>>
>> So here is xen_set_identity_and_remap_chunk():
>>
>> ...
>> while (i < n) {
>>    ...
>>    /* Do not remap pages beyond the current allocation */
>>    if (cur_pfn >= nr_pages) {
>>      /* Identity map remaining pages */
>>      set_phys_range_identity(cur_pfn, cur_pfn + size);
>>      break;
>>    }
>>    ...
>>
>> Now, I think the call to set_phys_range_identity() is really doing nothing
>> because nr_pages really is the same (or mostly beside of an 512 alignment) as
>> xen_p2m_size, so it just returns 0.
> 
> Sure, the p2m map is too small at this moment. We have no place to
> store the information to.
> 
>>    ...
>>    /*
>>     * If the PFNs are currently mapped, the VA mapping also needs
>>     * to be updated to be 1:1.
>>     */
>>    for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
>>            (void)HYPERVISOR_update_va_mapping(
>>                    (unsigned long)__va(pfn << PAGE_SHIFT),
>>                    mfn_pte(pfn, PAGE_KERNEL_IO), 0);
>>
>> I cannot make my head up about this one. Before this all changed, there was 
>> code
>> that resembled this loop but was rather clearing the mapping (except for a 
>> range
>> below 1M). Ok, that was done then in a different order which set identity
>> mapping after...
>>
>> My feeling is that the problem comes from assuming identity mapping for holes
>> after the initial mapping. I might miss something but I cannot really see 
>> where
>> this could be recovered.
> 
> Your hints were really helping. I think I've found an error.
> 
> What you've been missing is the fact that the new p2m list is
> initialized with identity frames after the area which was covered by
> the hypervisor supplied one.

Ah ok. Right, I missed that.

> 
> Could you please test the attached patch?

\o/ Yeah, that does seem to do the trick. The machine came up and did not loose
its mind or lapic base address!

-Stefan
> 
> 
> Juergen


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.