[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] MSI badness in xen-unstable
On Sun, Oct 17, 2010 at 1:19 PM, Bruce Edge <bruce.edge@xxxxxxxxx> wrote: > On Sat, Oct 16, 2010 at 10:26 AM, Sander Eikelenboom > <linux@xxxxxxxxxxxxxx> wrote: >> >> Probably there are more problems, you could also try a xen-unstable from >> before the commit that changed this code (msi.c) >> Another thing that could make it eassier to debug would be to put some >> printk's around the WARN_ON's in msi.c at the linenumbers that gave the >> warnings, showing but parts of the equation in the WARN_ON >> > > Good idea. > > Here's the debug stuff I added (so the printk output will make sense): Apologies, jumped the gun on the post, trying to do too many things at once. Ignore it, use this diff & output instead. Fixed errors in the printk logic. Here's the diff: diff -r 3a5755249361 xen/arch/x86/msi.c --- a/xen/arch/x86/msi.c Thu Oct 14 12:46:29 2010 +0100 +++ b/xen/arch/x86/msi.c Sun Oct 17 15:32:05 2010 -0700 @@ -549,14 +549,14 @@ return 0; if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64 ) { - addr &= ~PCI_BASE_ADDRESS_MEM_MASK; + addr &= PCI_BASE_ADDRESS_MEM_MASK; if ( ++bir >= limit ) return 0; return addr | ((u64)pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4) << 32); } - return addr & ~PCI_BASE_ADDRESS_MEM_MASK; + return addr & PCI_BASE_ADDRESS_MEM_MASK; } /** @@ -634,6 +634,14 @@ ASSERT(!dev->msix_used_entries); WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir)); + if(msi->table_base == read_pci_mem_bar(bus, slot, func, bir)) { // XXX + printk( "==================================================\n"); + printk( "msi->table_base != read_pci_mem_bar(bus, slot, func, bir)\n"); + printk( "msi->table_base = %0lx\n", msi->table_base ); + printk( "read_pci_mem_bar = %0lx\n", read_pci_mem_bar(bus, slot, func, bir) ); + printk( "bus=%0x, slot=%0x, func=%0x, bir=%0x\n", bus, slot, func, bir); + printk( "==================================================\n\n"); + } dev->msix_nr_entries = nr_entries; dev->msix_table.first = PFN_DOWN(table_paddr); @@ -647,6 +655,11 @@ bir = (u8)(pba_offset & PCI_MSIX_BIRMASK); pba_paddr = read_pci_mem_bar(bus, slot, func, bir); WARN_ON(!pba_paddr); + if (!pba_paddr) { // XXX + printk( "==================================================\n"); + printk( "No pba_addr: bus=%0x, slot=%0x, func=%0x, bir=%0x\n", bus, slot, func, bir); + printk( "==================================================\n\n"); + } pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK; dev->msix_pba.first = PFN_DOWN(pba_paddr); @@ -654,6 +667,14 @@ BITS_TO_LONGS(nr_entries) - 1); WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first, dev->msix_pba.last)); + if ( rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first, + dev->msix_pba.last)) { // XXX + printk( "==================================================\n"); + printk( "rangeset_overlaps_range\n" ); + printk( "mmio_ro_ranges = %p, dev->msix_pba.first = %0lx, dev->msix_pba.last = %0lx\n", + mmio_ro_ranges, dev->msix_pba.first, dev->msix_pba.last); + printk( "==================================================\n\n"); + } if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first, dev->msix_table.last) ) The updated boot log is attached. -Bruce > Also, here's the pci config of dom0, although I think it's the NIC's > that are responsible for this: > > 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI > Port (rev 12) > 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI > Express Root Port 1 (rev 12) > 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI > Express Root Port 3 (rev 12) > 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express > Root Port 5 (rev 12) > 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI > Express Root Port 7 (rev 12) > 00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI > Express Root Port 9 (rev 12) > 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management > Registers (rev 12) > 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch > Pad Registers (rev 12) > 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status > and RAS Registers (rev 12) > 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev > 12) > 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset > QuickData Technology Device (rev 12) > 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB > UHCI Controller #4 > 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB > UHCI Controller #5 > 00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB > UHCI Controller #6 > 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 > EHCI Controller #2 > 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI > Express Root Port 1 > 00:1c.1 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port > 2 > 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB > UHCI Controller #1 > 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB > UHCI Controller #2 > 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB > UHCI Controller #3 > 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 > EHCI Controller #1 > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) > 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface > Controller > 00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller > 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller > 01:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > 01:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > 04:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > 04:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > 05:00.0 SCSI storage controller: LSI Logic / Symbios Logic MegaRAID > SAS 8208ELP/8208ELP (rev 08) > 06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > Connection > 07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > Connection > 08:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW > WPCM450 (rev 0a) > ff:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath > Architecture Generic Non-Core Registers (rev 04) > ff:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath > Architecture System Address Decoder (rev 04) > ff:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev 04) > ff:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0 (rev > 04) > ff:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller (rev 04) > ff:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Target Address Decoder (rev 04) > ff:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Test Registers (rev 04) > ff:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 0 Control Registers (rev 04) > ff:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 0 Address Registers (rev 04) > ff:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 0 Rank Registers (rev 04) > ff:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 0 Thermal Control Registers (rev 04) > ff:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 1 Control Registers (rev 04) > ff:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 1 Address Registers (rev 04) > ff:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 1 Rank Registers (rev 04) > ff:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 1 Thermal Control Registers (rev 04) > ff:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 2 Control Registers (rev 04) > ff:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 2 Address Registers (rev 04) > ff:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 2 Rank Registers (rev 04) > ff:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated > Memory Controller Channel 2 Thermal Control Registers (rev 04) > > Thanks > > -Bruce > >> >> -- >> >> Sander >> >> Saturday, October 16, 2010, 7:14:11 PM, you wrote: >> >> > On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom >> > <linux@xxxxxxxxxxxxxx> wrote: >> >> Hi Bruce, >> >> >> >> I tripped over the same warning trying to solve my freezes. >> >> Jan Beulich has posted a patch which is not in xen-unstable yet: >> >> [Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a >> >> >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx> >> >> >> >> --- a/xen/arch/x86/msi.c >> >> +++ b/xen/arch/x86/msi.c >> >> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s >> >> return 0; >> >> if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == >> >> PCI_BASE_ADDRESS_MEM_TYPE_64 ) >> >> { >> >> - addr &= ~PCI_BASE_ADDRESS_MEM_MASK; >> >> + addr &= PCI_BASE_ADDRESS_MEM_MASK; >> >> if ( ++bir >= limit ) >> >> return 0; >> >> return addr | >> >> ((u64)pci_conf_read32(bus, slot, func, >> >> PCI_BASE_ADDRESS_0 + bir * 4) << 32); >> >> } >> >> - return addr & ~PCI_BASE_ADDRESS_MEM_MASK; >> >> + return addr & PCI_BASE_ADDRESS_MEM_MASK; >> >> } >> >> >> >> /** >> >> >> >> >> >> >> >> That fixes the warn, but my machine still keeps freezing non the less. >> >> (but it also does so with pci=nomsi so it's not msi specific in my case) >> >> >> >> -- >> >> >> >> Sander >> >> > Hi Sander, >> >> > Thank you. I tried it against 4.1.0-22240 with no effect. >> > I confirmed I had the right patch: >> >> 0 %>> hg diff xen/arch/x86/msi.c >> >> > diff -r 38ad3633ecaf xen/arch/x86/msi.c >> > --- a/xen/arch/x86/msi.c Wed Oct 13 12:01:30 2010 +0100 >> > +++ b/xen/arch/x86/msi.c Sat Oct 16 10:12:31 2010 -0700 >> > @@ -549,14 +549,14 @@ >> > return 0; >> > if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == >> > PCI_BASE_ADDRESS_MEM_TYPE_64 ) >> > { >> > - addr &= ~PCI_BASE_ADDRESS_MEM_MASK; >> > + addr &= PCI_BASE_ADDRESS_MEM_MASK; >> > if ( ++bir >= limit ) >> > return 0; >> > return addr | >> > ((u64)pci_conf_read32(bus, slot, func, >> > PCI_BASE_ADDRESS_0 + bir * 4) << 32); >> > } >> > - return addr & ~PCI_BASE_ADDRESS_MEM_MASK; >> > + return addr & PCI_BASE_ADDRESS_MEM_MASK; >> > } >> >> > /** >> >> > The boot time msi warn messages were unchanged. >> >> > -Bruce >> >> >> >> >> Saturday, October 16, 2010, 6:14:17 PM, you wrote: >> >> >> >>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge <bruce.edge@xxxxxxxxx> wrote: >> >>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco >> >>>> <gianni.tedesco@xxxxxxxxxx> wrote: >> >>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote: >> >>>>>> Hi, >> >>>>>> >> >>>>>> I've been trying to boot stefano's minimal dom0 kernel from >> >>>>>> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git >> >>>>>> 2.6.36-rc1-initial-domain-v2+pat >> >>>>>> >> >>>>>> On xen-unstable, I get the following WARN_ON()'s from Xen when >> >>>>>> bringing >> >>>>>> up the NIC's, then the machine hangs forever when trying to login >> >>>>>> either >> >>>>>> over serial or NIC. >> >>>>>> >> >>>>>> (XEN) Xen WARN at msi.c:649 >> >>>> >> >>>> I get the same Xen WARN messages using the current pvops/xen-next with >> >>>> xen-unstable, here's the complete list for one boot, grep'd for WARN: >> >>>> >> >>>> (XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) Xen WARN at msi.c:656 >> >>>> (XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) Xen WARN at msi.c:656 >> >>>> (XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) Xen WARN at msi.c:656 >> >>>> (XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) 0000000080287db8 0(XEN) Xen WARN at msi.c:636 >> >>>> (XEN) Xen WARN at msi.c:649 >> >>>> (XEN) Xen WARN at msi.c:656 >> >>>> >> >>>> The complete boot seq is attached. >> >>>> >> >>>> I do get a login at the end of the boot seq though. >> >>>> My situation goes pear shaped when I try start a pv domU. The dom0 >> >>>> locks up after printing this on the console: >> >>>> >> >>>> (XEN) tmem: all pools frozen for all domains >> >>>> (XEN) tmem: all pools thawed for all domains >> >>>> (XEN) tmem: all pools frozen for all domains >> >>>> (XEN) tmem: all pools thawed for all domains >> >>>> mapping kernel into physical memory >> >>>> about to get started... >> >>>> >> >>>> then prints these once a minute: >> >>>> [ 589.490894] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] >> >>>> >> >>>> The xen console is still active and I can generate a diag dump, also >> >>>> attached. >> >>>> >> >>>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all the way >> >>>> to .24, rendering the later pvops kernels unusable for dom0. >> >>>> The 2.6.32.18 kernel is the last one that functioned as a dom0. >> >>>> >> >>>> This behavior is consistent on platforms, HP proliant 380DL G6, and >> >>>> G7, as well as i7 supermicros. >> >>>> >> >>>> -Bruce >> >>>> >> >>>>> >> >>>>> Hmm so this appears not to be an issue with XCP kernel, in that case I >> >>>>> get the warnings but everything still works fine. >> >>>>> >> >>>>> I will investigate further when I have some time. >> >>>>> >> >>>>> Gianni >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Xen-devel mailing list >> >>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >> >>>>> http://lists.xensource.com/xen-devel >> >>>>> >> >>>> >> >> >> >>> The latest xen-unstable, 22240 has the same " (XEN) Xen WARN at >> >>> msi.c:636 " messages with associated stack traces. >> >> >> >>> I spent a little more time working with this version, and except for >> >>> these disconcerting messages, which do look like they are initiated by >> >>> the ethernet card discovery, the system appears functional. >> >>> In all cases the first occurrence is immediately after the NIC discovery: >> >> >> >>> e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2 >> >>> | e1000e: Copyright (c) 1999-2008 Intel Corporation. >> >>> | xen: registering gsi 16 triggering 0 polarity 1 >> >>> | xen_allocate_pirq: returning irq 16 for gsi 16 >> >>> xen: --> irq=16 >> >>> Already setup the GSI :16 >> >>> e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 >> >>> e1000e 0000:06:00.0: setting latency timer to 64 >> >>> alloc irq_desc for 493 on node 0 >> >>> alloc kstat_irqs on node 0 >> >>> (XEN) Xen WARN at msi.c:636 >> >>> (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Not tainted ]---- >> >>> .... >> >> >> >>> In case it's a NIC specific issue, I'm seeing it with both >> >>> 06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit >> >>> Network Connection >> >>> and >> >>> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II >> >>> BCM5709 Gigabit Ethernet (rev 20) >> >>> NICs >> >> >> >>> -Bruce >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> Best regards, >> >> Sander mailto:linux@xxxxxxxxxxxxxx >> >> >> >> >> >> >> >> -- >> Best regards, >> Sander mailto:linux@xxxxxxxxxxxxxx >> > Attachment:
patched-xen-boot-warn.log _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |