[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0



On 02/09/2015 02:33 PM, Stefan Bader wrote:
On 09.02.2015 14:07, Stefan Bader wrote:
On 05.02.2015 20:36, Konrad Rzeszutek Wilk wrote:
On Thu, Feb 05, 2015 at 03:33:02PM +0100, Stefan Bader wrote:
While experimenting/testing various kernel versions I discovered that trying to
boot a Haswell based hosts will always crash when booting as Xen dom0
(Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with
v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host
is having no issues (dom0 and bare metal).
Could be a table that the other host does not have and since its only happening
in dom0 maybe some cpu capability that needs to be passed on?

Usually it means that the ACPI AML code is trying to do something with
the IOAPIC or something wihch is not accessible.

But this on the other hand looks to be trying to execute some AML code
that is unknown. Any chance you cna disassemble it and perhaps also
run with acpi debug options on to figure out where it blows up?

The weird thing here is that bare-metal on the same machine does work. And
previous kernels did work as well. So I think we can assume the ACPI tables are
ok. It could even be a red-herring. Well, likely is as booting with acpi=off
does hang instead of crashing.

Since I got no clue, I did what we always do when we are dumbfound, I went ahead
and bisected 3.18..3.19-rc1. Unfortunately the very last kernel I build was
something in between good and bad. Good as it did not crash exactly but bad as
it did not come up in a usable state. So I would not be sure the claimed to be
offending commit is right. Could be one in the range of:

G  * xen: use common page allocation function in p2m.c
    * xen: Delay remapping memory of pv-domain
g  * xen: Delay m2p_override initialization
-> * xen: Delay invalidating extra memory
B  * x86: Introduce function to get pmd entry pointer

(G) really good, (g) somewhat not bad, (B) bad, (->) claimed first broken.

Oh, since that all sounds related to E820 in some way:

(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009a400 (usable)
(XEN)  000000000009a400 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000030a48000 (usable)
(XEN)  0000000030a48000 - 0000000030a49000 (reserved)

Hmm, this memory hole is at a rather low address. Could it be some
vital data (one of kernel, page tables, initrd or p2m map) is located
at this address?

This would be a problem similar to the one I ran into when trying to
test on a machine with 1TB of memory, where the p2m map was too big
to fit into contiguous memory.

Could you check the addresses where the hypervisor puts this data for
Dom0?


Juergen

(XEN)  0000000030a49000 - 00000000a27f4000 (usable)
(XEN)  00000000a27f4000 - 00000000a2ab4000 (reserved)
(XEN)  00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS)
(XEN)  00000000a2fb4000 - 00000000a2feb000 (ACPI data)
(XEN)  00000000a2feb000 - 00000000a3000000 (usable)
(XEN)  00000000a3000000 - 00000000afa00000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed00000 - 00000000fed04000 (reserved)
(XEN)  00000000fed10000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000fed84000 - 00000000fed85000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ffc00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 000000024e600000 (usable)

and how it looks with a 3.18 boot:

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable
[    0.000000] Xen: [mem 0x000000000009a400-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x0000000030a47fff] usable
[    0.000000] Xen: [mem 0x0000000030a48000-0x0000000030a48fff] reserved
[    0.000000] Xen: [mem 0x0000000030a49000-0x00000000a27f3fff] usable
[    0.000000] Xen: [mem 0x00000000a27f4000-0x00000000a2ab3fff] reserved
[    0.000000] Xen: [mem 0x00000000a2ab4000-0x00000000a2fb3fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000a2fb4000-0x00000000a2feafff] ACPI data
[    0.000000] Xen: [mem 0x00000000a2feb000-0x00000000a2ffffff] usable
[    0.000000] Xen: [mem 0x00000000a3000000-0x00000000af9fffff] reserved
[    0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] Xen: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
[    0.000000] Xen: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[    0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] Xen: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[    0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
[    0.000000] Xen: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] Xen: [mem 0x0000000100000000-0x00000001bdc59fff] usable
[    0.000000] Xen: [mem 0x00000001bdc5a000-0x000000024e5fffff] unusable

Not sure that helps much. I probably have to try comparing later output. But
that will need a bit of time.

-Stefan


So it seems one of the delaying changes has a very bad effect on that Sharkbay.
A bit odd since none of those sounds Intel/AMD geared. Could only be a different
usage of memory (my AMD box has considerably more memory and also no CPU with
GPU functionality as the Haswell).

Jürgen, maybe some description that might trigger an idea for you...?

-Stefan

---

git bisect start
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d
# bad: [97bf6af1f928216fd6c5a66e8a57bfa95a659672] Linux 3.19-rc1
git bisect bad 97bf6af1f928216fd6c5a66e8a57bfa95a659672
# good: [70e71ca0af244f48a5dcf56dc435243792e3a495] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good 70e71ca0af244f48a5dcf56dc435243792e3a495
# good: [988adfdffdd43cfd841df734664727993076d7cb] Merge branch 'drm-next' of
git://people.freedesktop.org/~airlied/linux
git bisect good 988adfdffdd43cfd841df734664727993076d7cb
# good: [b024793188002b9eed452b5f6a04d45003ed5772] staging: rtl8723au:
phy_SsPwrSwitch92CU() was never called with bRegSSPwrLvl != 1
git bisect good b024793188002b9eed452b5f6a04d45003ed5772
# bad: [66dcff86ba40eebb5133cccf450878f2bba102ef] Merge tag 'for-linus' of
git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect bad 66dcff86ba40eebb5133cccf450878f2bba102ef
# bad: [d6666be6f0c43efb9475d1d35fbef9f8be61b7b1] Merge tag 'for-linus-20141215'
of git://git.infradead.org/linux-mtd
git bisect bad d6666be6f0c43efb9475d1d35fbef9f8be61b7b1
# bad: [94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7] Merge tag 'fixes-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect bad 94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7
# good: [2dbfca5a181973558277b28b1f4c36362291f5e0] Merge branch 'for-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
git bisect good 2dbfca5a181973558277b28b1f4c36362291f5e0
# bad: [0db2812a5240f2663b92d8d4b761122dd2e0c6c3] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
git bisect bad 0db2812a5240f2663b92d8d4b761122dd2e0c6c3
# bad: [f1d04b23b2015b4c3c0a8419677179b133afea08] Merge branch
'devel/for-linus-3.19' into stable/for-linus-3.19
git bisect bad f1d04b23b2015b4c3c0a8419677179b133afea08
# bad: [792230c3a66b3d17d6dcca712866d24f2283d4a6] x86: Introduce function to get
pmd entry pointer
git bisect bad 792230c3a66b3d17d6dcca712866d24f2283d4a6
# good: [7108c9ce8f6e59f775b0c8250dba52b569b6cba2] xen: use common page
allocation function in p2m.c
# NOTE: This was the last really good
git bisect good 7108c9ce8f6e59f775b0c8250dba52b569b6cba2
# good: [97f4533a60ce5d0cb35ff44a190111f81a987620] xen: Delay m2p_override
initialization
# NOTE: This revision did not crash the usual way but was not useable either
# NOTE: Use of wrong bits in page-tables.
git bisect good 97f4533a60ce5d0cb35ff44a190111f81a987620


[    2.108038] ACPI: Core revision 20141107
[    2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at
table offset 0x002B (20141107/psloop-225)
[    2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at
table offset 0x0033 (20141107/psloop-225)
[    2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at
table offset 0x0038 (20141107/psloop-225)
[    2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at
table offset 0x0041 (20141107/psloop-225)
[    2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at
table offset 0x040D (20141107/psloop-225)
[    2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0
[    2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f
[    2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0
[    2.109018] Oops: 0000 [#1] SMP
[    2.109094] Modules linked in:
[    2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi
c #201502020035
[    2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe
ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
[    2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0
0000
[    2.109360] RIP: e030:[<ffffffff814573db>]  [<ffffffff814573db>] acpi_ps_peek
_opcode+0xd/0x1f
[    2.109445] RSP: e02b:ffffffff81c03ce8  EFLAGS: 00010283
[    2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50
[    2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030
[    2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f
[    2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030
[    2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d
[    2.109735] FS:  0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000
00000000000
[    2.109836] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660
[    2.109930] Stack:
[    2.109968]  ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457
a40
[    2.110104]  ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7
4e0
[    2.110238]  ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456
760
[    2.110373] Call Trace:
[    2.110413]  [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9
[    2.110461]  [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72
[    2.110508]  [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228
[    2.110555]  [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311
[    2.110602]  [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275
[    2.110649]  [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114
[    2.110698]  [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60
[    2.110746]  [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38
[    2.110792]  [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90
[    2.110840]  [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a
[    2.110889]  [<ffffffff81d83269>] acpi_load_tables+0xc/0x35
[    2.110935]  [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9
[    2.110982]  [<ffffffff81d825cf>] acpi_early_init+0x73/0x105
[    2.111029]  [<ffffffff81d3b083>] start_kernel+0x348/0x3f0
[    2.111075]  [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56
[    2.111121]  [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c
[    2.111169]  [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7
[    2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00
  19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel







_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.