[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] MSI badness in xen-unstable



On Sat, Oct 16, 2010 at 10:26 AM, Sander Eikelenboom
<linux@xxxxxxxxxxxxxx> wrote:
>
> Probably there are more problems, you could also try a xen-unstable from 
> before the commit that changed this code (msi.c)
> Another thing that could make it eassier to debug would be to put some 
> printk's around the WARN_ON's in msi.c  at the linenumbers that gave the 
> warnings, showing but parts of the equation in the WARN_ON
>

Good idea.

Here's the debug stuff I added (so the printk output will make sense):

diff -r 3a5755249361 xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c        Thu Oct 14 12:46:29 2010 +0100
+++ b/xen/arch/x86/msi.c        Sun Oct 17 13:18:06 2010 -0700
@@ -549,14 +549,14 @@
         return 0;
     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
     {
-        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        addr &= PCI_BASE_ADDRESS_MEM_MASK;
         if ( ++bir >= limit )
             return 0;
         return addr |
                ((u64)pci_conf_read32(bus, slot, func,
                                      PCI_BASE_ADDRESS_0 + bir * 4) << 32);
     }
-    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+    return addr & PCI_BASE_ADDRESS_MEM_MASK;
 }

 /**
@@ -633,7 +633,15 @@
         u32 pba_offset;

         ASSERT(!dev->msix_used_entries);
-        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func,
bir)); // XXX
+        if(msi->table_base != read_pci_mem_bar(bus, slot, func, bir)); {
+                       printk(
"==================================================\n");
+                       printk( "msi->table_base !=
read_pci_mem_bar(bus, slot, func, bir)\n");
+                       printk( "msi->table_base = %0lx\n", msi->table_base );
+                       printk( "read_pci_mem_bar = %0lx\n",
read_pci_mem_bar(bus, slot, func, bir) );
+                       printk( "bus=%0x, slot=%0x, func=%0x,
bir=%0x\n", bus, slot, func, bir);
+                       printk(
"==================================================\n\n");
+               }

         dev->msix_nr_entries = nr_entries;
         dev->msix_table.first = PFN_DOWN(table_paddr);
@@ -646,14 +654,27 @@
                                      msix_pba_offset_reg(pos));
         bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
         pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
-        WARN_ON(!pba_paddr);
+        WARN_ON(!pba_paddr); // XXX
+        if (!pba_paddr) {
+                       printk(
"==================================================\n");
+                       printk( "No pba_addr: bus=%0x, slot=%0x,
func=%0x, bir=%0x\n", bus, slot, func, bir);
+                       printk(
"==================================================\n\n");
+               }
         pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;

         dev->msix_pba.first = PFN_DOWN(pba_paddr);
         dev->msix_pba.last = PFN_DOWN(pba_paddr +
                                       BITS_TO_LONGS(nr_entries) - 1);
         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
-                                        dev->msix_pba.last));
+                                        dev->msix_pba.last)); // XXX
+        if ( ! rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
+                                        dev->msix_pba.last)) {
+                       printk(
"==================================================\n");
+                       printk( "rangeset_overlaps_range\n" );
+                       printk( "mmio_ro_ranges = %p,
dev->msix_pba.first = %0lx, dev->msix_pba.last = %0lx\n",
+                                       mmio_ro_ranges,
dev->msix_pba.first, dev->msix_pba.last);
+                       printk(
"==================================================\n\n");
+               }

         if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
                                 dev->msix_table.last) )



The boot log from this patched msi.c is attached. Let me know what
else I can add to help track down this issue.

Also, here's the pci config of dom0, although I think it's the NIC's
that are responsible for this:

00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI
Port (rev 12)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 1 (rev 12)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 3 (rev 12)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express
Root Port 5 (rev 12)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 7 (rev 12)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 9 (rev 12)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management
Registers (rev 12)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch
Pad Registers (rev 12)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status
and RAS Registers (rev 12)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 12)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI
Express Root Port 1
00:1c.1 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 2
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
01:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
01:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
04:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
04:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
05:00.0 SCSI storage controller: LSI Logic / Symbios Logic MegaRAID
SAS 8208ELP/8208ELP (rev 08)
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
08:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
WPCM450 (rev 0a)
ff:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath
Architecture Generic Non-Core Registers (rev 04)
ff:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath
Architecture System Address Decoder (rev 04)
ff:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev 04)
ff:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0 (rev 04)
ff:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller (rev 04)
ff:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Target Address Decoder (rev 04)
ff:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Test Registers (rev 04)
ff:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Control Registers (rev 04)
ff:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Address Registers (rev 04)
ff:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Rank Registers (rev 04)
ff:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Thermal Control Registers (rev 04)
ff:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Control Registers (rev 04)
ff:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Address Registers (rev 04)
ff:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Rank Registers (rev 04)
ff:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Thermal Control Registers (rev 04)
ff:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Control Registers (rev 04)
ff:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Address Registers (rev 04)
ff:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Rank Registers (rev 04)
ff:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Thermal Control Registers (rev 04)

Thanks

-Bruce

>
> --
>
> Sander
>
> Saturday, October 16, 2010, 7:14:11 PM, you wrote:
>
> > On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
> > <linux@xxxxxxxxxxxxxx> wrote:
> >> Hi Bruce,
> >>
> >> I tripped over the same warning trying to solve my freezes.
> >> Jan Beulich has posted a patch which is not in xen-unstable yet: 
> >> [Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx>
> >>
> >> --- a/xen/arch/x86/msi.c
> >> +++ b/xen/arch/x86/msi.c
> >> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
> >>         return 0;
> >>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == 
> >> PCI_BASE_ADDRESS_MEM_TYPE_64 )
> >>     {
> >> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> >> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
> >>         if ( ++bir >= limit )
> >>             return 0;
> >>         return addr |
> >>                ((u64)pci_conf_read32(bus, slot, func,
> >>                                      PCI_BASE_ADDRESS_0 + bir * 4) << 32);
> >>     }
> >> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> >> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
> >>  }
> >>
> >>  /**
> >>
> >>
> >>
> >> That fixes the warn, but my machine still keeps freezing non the less.
> >> (but it also does so with pci=nomsi so it's not msi specific in my case)
> >>
> >> --
> >>
> >> Sander
>
> > Hi Sander,
>
> > Thank you.  I tried it against 4.1.0-22240 with no effect.
> > I confirmed I had the right patch:
>
> 0 %>> hg diff  xen/arch/x86/msi.c
>
> > diff -r 38ad3633ecaf xen/arch/x86/msi.c
> > --- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
> > +++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
> > @@ -549,14 +549,14 @@
> >          return 0;
> >      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> > PCI_BASE_ADDRESS_MEM_TYPE_64 )
> >      {
> > -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> > +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
> >          if ( ++bir >= limit )
> >              return 0;
> >          return addr |
> >                 ((u64)pci_conf_read32(bus, slot, func,
> >                                       PCI_BASE_ADDRESS_0 + bir * 4) << 32);
> >      }
> > -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> > +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
> >  }
>
> >  /**
>
> > The boot time msi warn messages were unchanged.
>
> > -Bruce
>
> >>
> >> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
> >>
> >>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge <bruce.edge@xxxxxxxxx> wrote:
> >>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
> >>>> <gianni.tedesco@xxxxxxxxxx> wrote:
> >>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I've been trying to boot stefano's minimal dom0 kernel from
> >>>>>> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
> >>>>>> 2.6.36-rc1-initial-domain-v2+pat
> >>>>>>
> >>>>>> On xen-unstable, I get the following WARN_ON()'s from Xen when bringing
> >>>>>> up the NIC's, then the machine hangs forever when trying to login 
> >>>>>> either
> >>>>>> over serial or NIC.
> >>>>>>
> >>>>>> (XEN) Xen WARN at msi.c:649
> >>>>
> >>>> I get the same Xen WARN messages using the current pvops/xen-next with
> >>>> xen-unstable, here's the complete list for one boot, grep'd for WARN:
> >>>>
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>>
> >>>> The complete boot seq is attached.
> >>>>
> >>>> I do get a login at the end of the boot seq though.
> >>>> My situation goes pear shaped when I try start a pv domU. The dom0
> >>>> locks up after printing this on the console:
> >>>>
> >>>> (XEN) tmem: all pools frozen for all domains
> >>>> (XEN) tmem: all pools thawed for all domains
> >>>> (XEN) tmem: all pools frozen for all domains
> >>>> (XEN) tmem: all pools thawed for all domains
> >>>> mapping kernel into physical memory
> >>>> about to get started...
> >>>>
> >>>> then prints these once a minute:
> >>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
> >>>>
> >>>> The xen console is still active and I can generate a diag dump, also 
> >>>> attached.
> >>>>
> >>>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all the way
> >>>> to .24, rendering the later pvops kernels unusable for dom0.
> >>>> The 2.6.32.18 kernel is the last one that functioned as a dom0.
> >>>>
> >>>> This behavior is consistent on platforms, HP proliant 380DL G6, and
> >>>> G7, as well as i7 supermicros.
> >>>>
> >>>> -Bruce
> >>>>
> >>>>>
> >>>>> Hmm so this appears not to be an issue with XCP kernel, in that case I
> >>>>> get the warnings but everything still works fine.
> >>>>>
> >>>>> I will investigate further when I have some time.
> >>>>>
> >>>>> Gianni
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Xen-devel mailing list
> >>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>>> http://lists.xensource.com/xen-devel
> >>>>>
> >>>>
> >>
> >>> The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN at
> >>> msi.c:636 " messages with associated stack traces.
> >>
> >>> I spent a little more time working with this version, and except for
> >>> these disconcerting messages, which do look like they are initiated by
> >>> the ethernet card discovery, the system appears functional.
> >>> In all cases the first occurrence is immediately after the NIC discovery:
> >>
> >>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
> >>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
> >>> | xen: registering gsi 16 triggering 0 polarity 1
> >>> | xen_allocate_pirq: returning irq 16 for gsi 16
> >>>   xen: --> irq=16
> >>>   Already setup the GSI :16
> >>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> >>>   e1000e 0000:06:00.0: setting latency timer to 64
> >>>     alloc irq_desc for 493 on node 0
> >>>     alloc kstat_irqs on node 0
> >>>   (XEN) Xen WARN at msi.c:636
> >>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
> >>> ....
> >>
> >>> In case it's a NIC specific issue, I'm seeing it with both
> >>>     06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
> >>> Network Connection
> >>> and
> >>>     02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
> >>> BCM5709 Gigabit Ethernet (rev 20)
> >>> NICs
> >>
> >>> -Bruce
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>  Sander                            mailto:linux@xxxxxxxxxxxxxx
> >>
> >>
>
>
>
> --
> Best regards,
>  Sander                            mailto:linux@xxxxxxxxxxxxxx
>

Attachment: patched-xen-boot-warn.log
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.