WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLi

To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
From: "Cinco, Dante" <Dante.Cinco@xxxxxxx>
Date: Wed, 7 Oct 2009 18:08:04 -0600
Accept-language: en-US
Acceptlanguage: en-US
Delivery-date: Wed, 07 Oct 2009 17:08:36 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcpHq2gPvB1e1x6TSB6rYCIRr35eOQ==
Thread-topic: IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
I need help tracking down an IRQ SMP affinity problem.
 
Xen version: 3.4 unstable
dom0: Linux 2.6.30.3 (Debian)
domU: Linux 2.6.30.1 (Debian)
Hardware platform: HP ProLiant G6, dual-socket Xeon 5540, hyperthreading enable in BIOS and kernel (total of 16 CPUs: 2 sockets * 4 cores per socket * 2 threads per core)
 
With vcpus < 5, I can change /proc/irq/<irq#>/smp_affinity and see the interrupts get routed to the proper CPU(s) by checking /proc/interrupts. With vcpus > 4, any change to /proc/irq/<irq#>/smp_affinity results in a complete loss of interrupts for <irq#>.
 
I noticed in the domU /var/log/kern.log that APIC routing changes from "flat" for vcpus=4 to "physical flat" for vcpus=5. Looking at the source code for linux-2.6.30.1/arch/x86/kernel/apic/probe_64.c, this switch occurs when "max_physical_apicid >= 8." In the domU /var/log/kern.log and /proc/cpuinfo, only even numbered APIC IDs (starting from 0) are used so when it gets to the 5th CPU, it is already at APIC ID 8 which triggers the physical flat APIC routing.
 
dom0 has all 16 CPUs available to it. The mapping between CPU numbers and APIC ID is 1-to-1 (CPU0:APIC ID0 ... CPU15:APIC ID15). domU is configured with either vcpus=4 or vcpus=5. In both cases, the mapping uses even number only for the APIC IDs (CPU0:APIC ID0 ... CPU5:APIC ID8).
 
I'm using an ATTO/PMC Tachyon-based Fibre Channel PCIe card on this platform. It uses PCI-MSI-edge for its interrupt. I use pciback.hide in my dom0 Xen 3.5 kernel stanza to pass the device directly to domU. I'm also using "iommu=1,no-intremap,passthrough" in the stanza. I'm able to see the device in dom0 via "lspci -vv" and see the MSI message address and data that have been programmed into the Tachyon registers and using IRQ 32. Regardless of changes to IRQ 32's SMP affinity in domU, the MSI message address and data as seen from dom0 does not change. I can only conclude that domU is running some sort of IRQ emulation.
 
# lspci -vv in dom0
07:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
        Subsystem: Atto Technology Device 003c
        Interrupt: pin A routed to IRQ 32
        Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+
                Address: 00000000fee00000  Data: 40ba (dest ID=0, RH=DM=0, fixed interrupt, vector=0xba)
        Kernel driver in use: pciback
 
In domU, the device has been remapped (intentionally in the dom0 config file) to bus 0, device 8 and can also be seen via "lspci -vv" with the same MSI message address but different data and using IRQ 48.
 
# lspci -vv in domU with vcpus=5
00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
        Subsystem: Atto Technology Device 003c
        Interrupt: pin A routed to IRQ 48
        Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
                Address: 00000000fee00000  Data: 4059 (dest ID=0, RH=DM=0, fixed interrupt, vector=0x59)
        Kernel driver in use: hwdrv
        Kernel modules: hbas-hw
 
At this point, the kernel driver for the device has been loaded and the number of interrupts can be seen in /proc/interrupts. The default IRQ SMP has not been changed and yet the interrupts are all being routed to CPU0. This is for vcpus=5 (physical flat APIC routing). Changing IRQ 48's SMP affinity to any value will result in a complete loss of all interrupts. domU and dom0 need to be rebooted to restore normal operation.
# cat /proc/irq/48/smp_affinity
1f
# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       CPU4
  48:      60920          0          0          0          0   PCI-MSI-edge      HW_TACHYON
 
With vcpus=4 (flat APIC routing), IRQ 48's SMP affinity behaves as expected (each of the 4 bits in /proc/irq/48/smp_affinity correspond to a CPU or CPUs where the interrupts will be routed). The MSI message address and data have different attributes compared to vcpus=5. The address has dest ID=f (matches default /proc/irq/48/smp_affinity), RH=DM=1 and uses lowest priority instead of fixed interrupt.
 
# lspci -vv in domU with vcpus=4
00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
        Subsystem: Atto Technology Device 003c
        Interrupt: pin A routed to IRQ 48
        Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
                Address: 00000000fee0f00c  Data: 4159 (dest ID=f, RH=DM=1, lowest priority interrupt, vector=0x59)
        Kernel driver in use: hwdrv
        Kernel modules: hbas-hw
 
# cat /proc/irq/48/smp_affinity
f
# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
  48:      14082      19052      15337      14645   PCI-MSI-edge      HW_TACHYON
 
Changing IRQ 48's SMP affinity to 8 shows that all the interrupts are being routed to CPU3 as expected and the MSI message address has changed to reflect the new dest ID while the vector stays the same.
 
# echo 8 > /proc/irq/48/smp_affinity
# cat /proc/interrupts
  48:      14082      19052      15338     351361   PCI-MSI-edge      HW_TACHYON
 
# lspci -vv in domU with vcpus=4
00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
        Subsystem: Atto Technology Device 003c
        Interrupt: pin A routed to IRQ 48
        Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
                Address: 00000000fee0800c  Data: 4159 (dest ID=8, RH=DM=1, lowest priority interrupt, vector=0x59)
        Kernel driver in use: hwdrv
        Kernel modules: hbas-hw
 
My hunch is there is something wrong with physical flat APIC routing in domU. If I boot this same platform to straight Linux 2.6.30.1 (no Xen), /var/log/kern.log shows that it too is using physical flat APIC routing which is expected since it has a total of 16 CPUs. Unlike domU though, changing the IRQ SMP affinity to any one-hot value (only one bit out of 16 is set to 1) behaves as expected. A non-one hot value results in all interrupts being routed to CPU0 but at least the interrupts are not lost.
 
One of my questions is "Why does domU use only even numbered APIC IDs?" If it used odd numbers, then physical flat APIC routing will only trigger when vcpus > 7.
 
I welcome any suggestions on how to pursue this problem or hopefully, someone will say that a patch for this already exists.
 
Thanks.
 
Dante Cinco
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel