|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] L1[0x1fb] = 0000000000000000 which faults on one type of mac
I am troubleshooting an issue where the Linux kernel tries
to dereference a not present entry. I have a fix for this
in for-2.6.32/bug-fixes .. but please read on.
Specifically it tries to derefence the fixmapped value of
APIC_BASE. The fixmapped value of APIC_BASE is actually not set
due to git commit a1d8e2fa8325064338b2da1bcf0d7a0473883c284
which adds this in arch/x86/kernel/acpi/boot.c:
static void __init acpi_register_lapic_address(unsigned long address)
{
/* Xen dom0 doesn't have usable lapics */
if (xen_initial_domain())
return;
mp_lapic_addr = address;
set_fixmap_nocache(FIX_APIC_BASE, address);
Later on we use 'native_apic_read' which tries to use the APIC_BASE as
address (it is present to be @ slot FIX_APIC_BASE of the fixmap
API) and it fails (on some machines).
Since we don't call 'set_fixmap_nocache(FIX_APIC_BASE)' and
if one were to go through the pagetable this is what we get:
[ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] mapped APIC to ffffffffff5fb000 (00000000)
(XEN) d0:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffffffff5fb020:
(XEN) L4[0x1ff] = 0000000221003067 0000000000001003
(XEN) L3[0x1ff] = 0000000221004067 0000000000001004
(XEN) L2[0x1fa] = 0000000221771067 0000000000001771
(XEN) L1[0x1fb] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1-110309 x86_64 debug=y Tainted: C ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff8102b5d1>]
(XEN) RFLAGS: 0000000000000292 EM: 1 CONTEXT: pv guest
(XEN) rax: ffffffff8164cf50 rbx: 000000026ec00000 rcx: 00000000ffffdd85
(XEN) rdx: 00000000ffffffff rsi: 0000000000000000 rdi: 0000000000000020
(XEN) rbp: ffffffff81643ea8 rsp: ffffffff81643e50 r8: 0000000000000002
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff880013671800 r13: 00000000bff66000 r14: ffffffffffffffff
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0
(XEN) cr3: 0000000221001000 cr2: ffffffffff5fb020
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81643e50:
Which is to say that the L1 has this:
0000000115771fa0: 00000000 00000000 00000000 00000000
0000000115771fb0: 00000000 00000000 00000000 00000000
0000000115771fc0: 00000000 00000000 15770067 80100001
0000000115771fd0: 15770067 80100001 00000000 00000000
0000000115771fe0: 00000000 00000000 00000000 00000000
0000000115771ff0: 00000000 00000000 00000000 00000000
L1[0x1fb] is machine address 115771fd8, which has nothing in it.
OK, so I've come up a fix that is a back-port of how 2.6.38 does it
which is that it removes the check I mentioned above and in xen_set_fixmap
we set the FIX_APIC_BASE to actually point to a dummy ioapic_mapping.
It is 7cb068cf1ba90425e12f3a7b3caed9d018fa9b8c in for-2.6.32/bug-fixes
Gianni, you might want to check this out in case it fixes the problem you
are experiencing.
But one thing I can't understand is why on one machine (IBM x3850)
I get this crash, while another one with the same pagetable contents
(L1 has nothing for 0x1fb) it works just fine? I added a panic and used
the Xen hypervisor kdb to manually inspect the pagetable, and it has
the same contents as the IBM x3850 -but it boots fine with this invalid value.
Any ideas?
FYI, seems another user (Sven Sübert) IBM x3650 hits the same bug. And with
this fix he is able to boot.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|