[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced



On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 17/02/2020 20:41, Jason Andryuk wrote:
> > On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 
> > wrote:
> >> On 17/02/2020 19:19, Jason Andryuk wrote:
> >>> enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@xxxxxxxxx> 
> >>> wrote:
> >>>> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
> >>>>> Is there any full boot log in the bad case?  Debugging via divination
> >>>>> isn't an effective way to get things done.
> >>>> Agreed. I included some more verbose logs towards the end of the email 
> >>>> (typed up by hand).
> >>>>
> >>>> Attached are pictures from a slow-motion video of my laptop booting. 
> >>>> Note that I also included a picture of a stack trace that happens 
> >>>> immediately before reboot. It doesn't look related, but I wanted to 
> >>>> include it anyway.
> >>>>
> >>>> I think the original email should have said "4.8.5" instead of "4.0.5." 
> >>>> Regardless, everyone on this mailing list can now see all the boot logs 
> >>>> that I've seen.
> >>>>
> >>>> Attaching a serial console seems like it would be difficult to do on 
> >>>> this laptop, otherwise I would have sent the logs as a txt file.
> >>> I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
> >>> Latitude 7200 2-in-1.  Fedora 31 Live USB image boots successfully.
> >>> No way to get serial output.  I manually recreated the output before
> >>> from the vga display.
> >> We have multiple bugs.
> >>
> >> First and foremost, Xen seems totally broken when running in ExtINT
> >> mode.  This needs addressing, and ought to be sufficient to let Xen
> >> boot, at which point we can try to figure out why it is trying to fall
> >> back into 486(ish) compatibility mode.

Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
CPU#0" so linux isn't using ExtINT?

I copy and pasted the linux setup_local_APIC() into Xen and then
massaged it into compiling,  Now Xen reports masked ExtINT, but still
fails to enable the timer.

> >>> I tested Linux with intel_iommu=on and that booted successfully.
> >>> Under Xen, this system sets iommu_x2apic_enabled = true, so
> >>> force_iommu is set and iommu=0 cannot disable the iommu.
> >>> fails.  Oh, I can disable x2apic and then disable iommu
> >>>
> >>> x2apic=1 -> failure above
> >>> x2apic=0 iommu=0 -> failure above
> >>> clocksource=acpi -> doesn't help
> >>> clocksource=pit -> hangs after "load tracking window length 1073741824 ns"
> >> None of these are surprising, given that Xen can't make any interrupts
> >> work at all.
> >>
> >>> noapic -> BUG in init_bsp_APIC
> >> This is a surprise.  Its clearly a bug in Xen.  (OTOH, I've been
> >> threatening to rip all of that logic out, because there is no such thing
> >> as a 64bit capable system without an integrated APIC.)
> > It's a GPF [error_code=0000] at init_bsp_APIC+0x53 which is
> >     0xffff82d080428f86 <+64>:    je     0xffff82d080428fc9 
> > <init_bsp_APIC+131>
> >     0xffff82d080428f88 <+66>:    or     $0xff,%al
> >     0xffff82d080428f8a <+68>:    test   %sil,%sil
> >     0xffff82d080428f8d <+71>:    je     0xffff82d080428fd8 
> > <init_bsp_APIC+146>
> >     0xffff82d080428f8f <+73>:    mov    $0x80f,%ecx
> >     0xffff82d080428f94 <+78>:    mov    $0x0,%edx
> >     0xffff82d080428f99 <+83>:    wrmsr
> >
> > RAX is 0x3ff
> >
> > This is immediately after Xen prints "Switched to APIC driver 
> > x2apic_cluster"
>
> Hmm, in which case it isn't a BUG specifically, but merely a crash.
> 0x3ff to SPIV is trying to set reserved bits, so it is no surprise that
> there is a #GP.

Yeah, I used the wrong word.  There was a backtrace and it rebooted
quickly, so I didn't have details when I wrote the first email.  I
re-ran afterward to capture the info.

> In which case this can safely be filed under "even more collateral
> damage from failing to set up any kind of interrupt handling".
>
> >>> One other thing that might be noteworthy.  Linux only prints ACPI IRQ0
> >>> and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.
> >> Huh - this is supposed to come directly from the ACPI tables, so Linux
> >> and Xen should be using the same source of information.
> >>
> >>> Below is the re-constructed Xen console output.  The SMBIOS line is
> >>> the first thing displayed on the VGA output.
> >> Yes - it is the first thing printed after vesa_init() which I think is a
> >> manifestation of a previous EFI bug I've reported.  Does booting with
> >> -basevideo help?  (No need to transcribe the output, manually.  Just
> >> need to know if it lets you see the full log.)
> > I'm booting grub->xen.gz so -basevideo isn't directly applicable.  My
> > attempt at setting a boot entry failed, so I'll have to try that
> > again.
>
> Ah ok.  One thing which Xen(.gz) needs to do is to take video details
> from the bootloader rather than trying to figure them out itself.
>
> By default, Xen.gz will try and write into the legacy vga range which
> most likely isn't working in an EFI system.
>
> (As a slight tangent, It is possible to test xen.efi via grub with a
> suitable chainloader stanza, but xen.efi is deficient in enough
> important ways that I'd avoid it unless absolutely necessary.)

I think I tried chainloader at some point and received an "Unsupported
relocation type" error.

This Dell doesn't want to boot my xen.efi.  After selecting a boot
entry, there is a 3-4 second pause and then EFI prints  "Press
F1/VolumeUp key to retry boot."

-Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.