[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression, host crash with 4.5rc1



On 11/10/2014 0:51, Jan Beulich wrote:
On 10.11.14 at 09:03, <sflist@xxxxxxxxx> wrote:
Sorry for the delay, took some debugging on another computer to get
serial logging working. Due to its size, I've posted the entire log of a
crashed session here: http://pastebin.com/AiPHUZRH In this case I used a
3.0 gig memory size for the Windows domU.

As I mentioned before, sometimes it's the SATA that goes first, other
times the tg3 ethernet driver. Also note that between 4.4.1 and 4.5rc1,
the kernel I'm using (stock Debian Jessie) has not changed.

Please let me know if you need any other information. Thanks!
Raising the kernel log level to maximum too would have helped.

Okay, I've done that and the output is here, let me know if you have any preferred logging flags instead:

http://pastebin.com/M3yvWNTT

Regardless of that, the first device showing anomalies here appears
to be the UHCI controller:

     [  147.415713] usb 7-1: reset low-speed USB device number 2 using uhci_hcd

while booting the guest.

I assume this is related to the USB device (a keyboard) I'm passing through to the domU.

And these

     [  199.775209] pcieport 0000:00:03.0: AER: Multiple Corrected error 
received: id=0018
     [  199.775238] pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, 
type=Data Link Layer, id=0018(Transmitter ID)
     [  199.775251] pcieport 0000:00:03.0:   device [8086:340a] error 
status/mask=00001100/00002000
     [  199.775255] pcieport 0000:00:03.0:    [ 8] RELAY_NUM Rollover
     [  199.775258] pcieport 0000:00:03.0:    [12] Replay Timer Timeout

hint at a problem in the system's design. 00:03.0 is the parent bridge
of 02:00.0 (and from what I can tell that's the only device behind that
bridge), and hence the above messages can only reasonably have
their origin at the passed through VGA device.

You are correct that the VGA card is the only device on 03.0:

root@g2:~# lspci -tv
-[0000:00]-+-00.0  Intel Corporation 5520 I/O Hub to ESI Port
+-01.0-[01]----00.0 Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B
           +-03.0-[02]----00.0  NVIDIA Corporation GT200GL [Quadro FX 4800]
           +-07.0-[03]--
+-14.0 Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers +-14.1 Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers +-14.2 Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers +-16.0 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.1 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.2 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.3 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.4 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.5 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.6 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.7 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-1a.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 +-1a.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 +-1a.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 +-1b.0 Intel Corporation 82801JI (ICH10 Family) HD Audio Controller
           +-1c.0-[04]--
+-1c.4-[05]----00.0 Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express
           +-1c.5-[06-09]----00.0-[07-09]--+-02.0-[08]--
| \-03.0-[09]----00.0 Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express +-1d.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 +-1d.3 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 +-1d.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 +-1e.0-[0a]----0e.0 Advanced Micro Devices, Inc. [AMD/ATI] RV100 [Radeon 7000 / Radeon VE] +-1f.0 Intel Corporation 82801JIB (ICH10) LPC Interface Controller +-1f.2 Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller \-1f.3 Intel Corporation 82801JI (ICH10 Family) SMBus Controller

What problem in the system's design does this hint at?

IOW it may well be that
you were just lucky that things worked earlier on.

Certainly possible but this is a very common machine in the corporate world -- a Lenovo ThinkStation D20 running the X58 chipset. If it's an inherent defect in the machine and somebody else hasn't already tripped over it I would be very surprised.

And btw - the title saying "host crash" seems to not match the provided
log, as there's no sign of a crash anywhere (the host may be hung from
what is visible). Was that just badly worded, or have you actually seen
crashes too?


Only seen hanging. Sorry for the lack of technical rigor on the title, but from the other end of the ethernet cable, it might as well have crashed.

If the expanded logging doesn't tell you anything useful, I'll see if I can bisect the problem.

Thanks very much for your time.

Steve

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.