# HG changeset patch # User tristan.gingold@xxxxxxxx # Node ID 3d7272deebe6040752fc7715c795c0911ae4e645 # Parent 5fcc346d6fe086436977a9b171f2bdb3a177d828 IOSAPIC virtualisation. See details in xen/arch/ia64/linux-xen/iosapic.c TODO: * serial IRQ in Xen * revisit when driver domains and SMP domain0. Signed-off-by: Tristan Gingold diff -r 5fcc346d6fe0 -r 3d7272deebe6 linux-2.6-xen-sparse/include/asm-xen/asm-ia64/hypercall.h --- a/linux-2.6-xen-sparse/include/asm-xen/asm-ia64/hypercall.h Thu Jan 26 10:31:28 2006 +++ b/linux-2.6-xen-sparse/include/asm-xen/asm-ia64/hypercall.h Wed Feb 1 12:17:54 2006 @@ -422,26 +422,17 @@ return ret; } -#if 0 static inline int HYPERVISOR_physdev_op( void *physdev_op) { -#if 0 - int ret; - unsigned long ign; - - __asm__ __volatile__ ( - TRAP_INSTR - : "=a" (ret), "=b" (ign) - : "0" (__HYPERVISOR_physdev_op), "1" (physdev_op) - : "memory" ); - - return ret; -#endif - return 1; -} -#endif + int ret; + __asm__ __volatile__ ( ";; mov r14=%2 ; mov r2=%1 ; break 0x1000 ;; mov %0=r8 ;;" + : "=r" (ret) + : "i" (__HYPERVISOR_physdev_op), "r"(physdev_op) + : "r14","r2","r8","memory" ); + return ret; +} static inline int HYPERVISOR_grant_table_op( diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/arch/ia64/Makefile --- a/xen/arch/ia64/Makefile Thu Jan 26 10:31:28 2006 +++ b/xen/arch/ia64/Makefile Wed Feb 1 12:17:54 2006 @@ -10,7 +10,7 @@ extable.o linuxextable.o sort.o xenirq.o xentime.o \ regionreg.o entry.o unaligned.o privop.o vcpu.o \ irq_ia64.o irq_lsapic.o vhpt.o xenasm.o hyperprivop.o dom_fw.o \ - sn_console.o # ia64_ksyms.o + sn_console.o iosapic.o # ia64_ksyms.o OBJS += vmx_init.o vmx_virt.o vmx_vcpu.o vmx_process.o vmx_vsa.o vmx_ivt.o\ vmx_phy_mode.o vmx_utility.o vmx_interrupt.o vmx_entry.o vmmu.o \ diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/arch/ia64/xen/acpi.c --- a/xen/arch/ia64/xen/acpi.c Thu Jan 26 10:31:28 2006 +++ b/xen/arch/ia64/xen/acpi.c Wed Feb 1 12:17:54 2006 @@ -44,7 +44,7 @@ #include #include #include -//#include +#include #include #include #include @@ -121,14 +121,13 @@ #ifdef CONFIG_ACPI_BOOT #define ACPI_MAX_PLATFORM_INTERRUPTS 256 -#define NR_IOSAPICS 4 - -#if 0 + /* Array to record platform interrupt vectors for generic interrupt routing. */ int platform_intr_list[ACPI_MAX_PLATFORM_INTERRUPTS] = { [0 ... ACPI_MAX_PLATFORM_INTERRUPTS - 1] = -1 }; +#if 0 enum acpi_irq_model_id acpi_irq_model = ACPI_IRQ_MODEL_IOSAPIC; /* @@ -253,9 +252,7 @@ acpi_table_print_madt_entry(header); -#if 0 iosapic_init(iosapic->address, iosapic->global_irq_base); -#endif return 0; } @@ -274,7 +271,6 @@ acpi_table_print_madt_entry(header); -#if 0 /* * Get vector assignment for this interrupt, set attributes, * and program the IOSAPIC routing table. @@ -288,7 +284,6 @@ (plintsrc->flags.trigger == 1) ? IOSAPIC_EDGE : IOSAPIC_LEVEL); platform_intr_list[plintsrc->type] = vector; -#endif return 0; } @@ -306,11 +301,9 @@ acpi_table_print_madt_entry(header); -#if 0 iosapic_override_isa_irq(p->bus_irq, p->global_irq, (p->flags.polarity == 1) ? IOSAPIC_POL_HIGH : IOSAPIC_POL_LOW, (p->flags.trigger == 1) ? IOSAPIC_EDGE : IOSAPIC_LEVEL); -#endif return 0; } @@ -362,9 +355,7 @@ #else has_8259 = acpi_madt->flags.pcat_compat; #endif -#if 0 iosapic_system_init(has_8259); -#endif /* Get base address of IPI Message Block */ diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/arch/ia64/xen/hypercall.c --- a/xen/arch/ia64/xen/hypercall.c Thu Jan 26 10:31:28 2006 +++ b/xen/arch/ia64/xen/hypercall.c Wed Feb 1 12:17:54 2006 @@ -8,6 +8,7 @@ #include #include +#include #include /* FOR EFI_UNIMPLEMENTED */ #include /* FOR struct ia64_sal_retval */ @@ -18,6 +19,7 @@ #include extern unsigned long translate_domain_mpaddr(unsigned long); +extern long do_physdev_op(physdev_op_t *uop); unsigned long idle_when_pending = 0; unsigned long pal_halt_light_count = 0; @@ -180,6 +182,10 @@ regs->r8 = do_xen_version(regs->r14, regs->r15); break; + case __HYPERVISOR_physdev_op: + regs->r8 = do_physdev_op (regs->r14); + break; + default: printf("unknown hypercall %x\n", regs->r2); regs->r8 = (unsigned long)-1; diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/arch/ia64/xen/irq.c --- a/xen/arch/ia64/xen/irq.c Thu Jan 26 10:31:28 2006 +++ b/xen/arch/ia64/xen/irq.c Wed Feb 1 12:17:54 2006 @@ -75,7 +75,7 @@ #ifdef XEN #include #define _irq_desc irq_desc -#define irq_descp(irq) &irq_desc[irq] +#define irq_descp(irq) (&irq_desc[irq]) #define apicid_to_phys_cpu_present(x) 1 #endif diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/arch/ia64/xen/vcpu.c --- a/xen/arch/ia64/xen/vcpu.c Thu Jan 26 10:31:28 2006 +++ b/xen/arch/ia64/xen/vcpu.c Wed Feb 1 12:17:54 2006 @@ -747,12 +747,7 @@ IA64FAULT vcpu_get_lid(VCPU *vcpu, UINT64 *pval) { -extern unsigned long privop_trace; -//privop_trace=1; - //TODO: Implement this - printf("vcpu_get_lid: WARNING: Getting cr.lid always returns zero\n"); - //*pval = 0; - *pval = ia64_getreg(_IA64_REG_CR_LID); + *pval = vcpu->vcpu_id * 0x01010000; return IA64_NO_FAULT; } diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/arch/ia64/xen/xenirq.c --- a/xen/arch/ia64/xen/xenirq.c Thu Jan 26 10:31:28 2006 +++ b/xen/arch/ia64/xen/xenirq.c Wed Feb 1 12:17:54 2006 @@ -32,29 +32,34 @@ } +extern int xen_reflect_interrupt_to_domains (ia64_vector vector); + int xen_do_IRQ(ia64_vector vector) { - if (vector != IA64_TIMER_VECTOR && vector != IA64_IPI_VECTOR) { - extern void vcpu_pend_interrupt(void *, int); + struct vcpu *vcpu; + ia64_vector v; + + /* Do not reflect special interrupts. */ + if (vector == IA64_TIMER_VECTOR || vector == IA64_IPI_VECTOR) + return 0; + #if 0 - if (firsttime[vector]) { - printf("**** (iterate) First received int on vector=%d,itc=%lx\n", - (unsigned long) vector, ia64_get_itc()); - firsttime[vector] = 0; - } - if (firstpend[vector]) { - printf("**** First pended int on vector=%d,itc=%lx\n", - (unsigned long) vector,ia64_get_itc()); - firstpend[vector] = 0; - } + if (firsttime[vector]) { + printf("**** (iterate) First received int on vector=%d,itc=%lx\n", + (unsigned long) vector, ia64_get_itc()); + firsttime[vector] = 0; + } + if (firstpend[vector]) { + printf("**** First pended int on vector=%d,itc=%lx\n", + (unsigned long) vector,ia64_get_itc()); + firstpend[vector] = 0; + } #endif - //FIXME: TEMPORARY HACK!!!! - vcpu_pend_interrupt(dom0->vcpu[0],vector); - vcpu_wake(dom0->vcpu[0]); - return(1); - } - return(0); + + /* Send the interrupt to the domains. + Return 0 if must be handled by xen too. */ + return xen_reflect_interrupt_to_domains (vector); } /* diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/include/asm-ia64/config.h --- a/xen/include/asm-ia64/config.h Thu Jan 26 10:31:28 2006 +++ b/xen/include/asm-ia64/config.h Wed Feb 1 12:17:54 2006 @@ -299,6 +299,7 @@ #define __smp_processor_id() 0 #endif +#define CONFIG_IOSAPIC #ifndef __ASSEMBLY__ #include diff -r 5fcc346d6fe0 -r 3d7272deebe6 linux-2.6-xen-sparse/arch/ia64/kernel/iosapic.c --- /dev/null Thu Jan 26 10:31:28 2006 +++ b/linux-2.6-xen-sparse/arch/ia64/kernel/iosapic.c Wed Feb 1 12:17:54 2006 @@ -0,0 +1,1122 @@ +/* + * I/O SAPIC support. + * + * Copyright (C) 1999 Intel Corp. + * Copyright (C) 1999 Asit Mallick + * Copyright (C) 2000-2002 J.I. Lee + * Copyright (C) 1999-2000, 2002-2003 Hewlett-Packard Co. + * David Mosberger-Tang + * Copyright (C) 1999 VA Linux Systems + * Copyright (C) 1999,2000 Walt Drummond + * + * 00/04/19 D. Mosberger Rewritten to mirror more closely the x86 I/O APIC code. + * In particular, we now have separate handlers for edge + * and level triggered interrupts. + * 00/10/27 Asit Mallick, Goutham Rao IRQ vector allocation + * PCI to vector mapping, shared PCI interrupts. + * 00/10/27 D. Mosberger Document things a bit more to make them more understandable. + * Clean up much of the old IOSAPIC cruft. + * 01/07/27 J.I. Lee PCI irq routing, Platform/Legacy interrupts and fixes for + * ACPI S5(SoftOff) support. + * 02/01/23 J.I. Lee iosapic pgm fixes for PCI irq routing from _PRT + * 02/01/07 E. Focht Redirectable interrupt vectors in + * iosapic_set_affinity(), initializations for + * /proc/irq/#/smp_affinity + * 02/04/02 P. Diefenbaugh Cleaned up ACPI PCI IRQ routing. + * 02/04/18 J.I. Lee bug fix in iosapic_init_pci_irq + * 02/04/30 J.I. Lee bug fix in find_iosapic to fix ACPI PCI IRQ to IOSAPIC mapping + * error + * 02/07/29 T. Kochi Allocate interrupt vectors dynamically + * 02/08/04 T. Kochi Cleaned up terminology (irq, global system interrupt, vector, etc.) + * 02/09/20 D. Mosberger Simplified by taking advantage of ACPI's pci_irq code. + * 03/02/19 B. Helgaas Make pcat_compat system-wide, not per-IOSAPIC. + * Remove iosapic_address & gsi_base from external interfaces. + * Rationalize __init/__devinit attributes. + * 04/12/04 Ashok Raj Intel Corporation 2004 + * Updated to work with irq migration necessary for CPU Hotplug + */ +/* + * Here is what the interrupt logic between a PCI device and the kernel looks like: + * + * (1) A PCI device raises one of the four interrupt pins (INTA, INTB, INTC, INTD). The + * device is uniquely identified by its bus--, and slot-number (the function + * number does not matter here because all functions share the same interrupt + * lines). + * + * (2) The motherboard routes the interrupt line to a pin on a IOSAPIC controller. + * Multiple interrupt lines may have to share the same IOSAPIC pin (if they're level + * triggered and use the same polarity). Each interrupt line has a unique Global + * System Interrupt (GSI) number which can be calculated as the sum of the controller's + * base GSI number and the IOSAPIC pin number to which the line connects. + * + * (3) The IOSAPIC uses an internal routing table entries (RTEs) to map the IOSAPIC pin + * into the IA-64 interrupt vector. This interrupt vector is then sent to the CPU. + * + * (4) The kernel recognizes an interrupt as an IRQ. The IRQ interface is used as + * architecture-independent interrupt handling mechanism in Linux. As an + * IRQ is a number, we have to have IA-64 interrupt vector number <-> IRQ number + * mapping. On smaller systems, we use one-to-one mapping between IA-64 vector and + * IRQ. A platform can implement platform_irq_to_vector(irq) and + * platform_local_vector_to_irq(vector) APIs to differentiate the mapping. + * Please see also include/asm-ia64/hw_irq.h for those APIs. + * + * To sum up, there are three levels of mappings involved: + * + * PCI pin -> global system interrupt (GSI) -> IA-64 vector <-> IRQ + * + * Note: The term "IRQ" is loosely used everywhere in Linux kernel to describe interrupts. + * Now we use "IRQ" only for Linux IRQ's. ISA IRQ (isa_irq) is the only exception in this + * source code. + */ +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#ifdef CONFIG_XEN +#include +#include +#endif + +#include +#include +#include +#include +#include +#include +#include +#include + + +#undef DEBUG_INTERRUPT_ROUTING + +#ifdef DEBUG_INTERRUPT_ROUTING +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + +#define NR_PREALLOCATE_RTE_ENTRIES (PAGE_SIZE / sizeof(struct iosapic_rte_info)) +#define RTE_PREALLOCATED (1) + +static DEFINE_SPINLOCK(iosapic_lock); + +/* These tables map IA-64 vectors to the IOSAPIC pin that generates this vector. */ + +struct iosapic_rte_info { + struct list_head rte_list; /* node in list of RTEs sharing the same vector */ + char __iomem *addr; /* base address of IOSAPIC */ + unsigned int gsi_base; /* first GSI assigned to this IOSAPIC */ + char rte_index; /* IOSAPIC RTE index */ + int refcnt; /* reference counter */ + unsigned int flags; /* flags */ +} ____cacheline_aligned; + +static struct iosapic_intr_info { + struct list_head rtes; /* RTEs using this vector (empty => not an IOSAPIC interrupt) */ + int count; /* # of RTEs that shares this vector */ + u32 low32; /* current value of low word of Redirection table entry */ + unsigned int dest; /* destination CPU physical ID */ + unsigned char dmode : 3; /* delivery mode (see iosapic.h) */ + unsigned char polarity: 1; /* interrupt polarity (see iosapic.h) */ + unsigned char trigger : 1; /* trigger mode (see iosapic.h) */ +} iosapic_intr_info[IA64_NUM_VECTORS]; + +static struct iosapic { + char __iomem *addr; /* base address of IOSAPIC */ + unsigned int gsi_base; /* first GSI assigned to this IOSAPIC */ + unsigned short num_rte; /* number of RTE in this IOSAPIC */ +#ifdef CONFIG_NUMA + unsigned short node; /* numa node association via pxm */ +#endif +} iosapic_lists[NR_IOSAPICS]; + +static int num_iosapic; + +static unsigned char pcat_compat __initdata; /* 8259 compatibility flag */ + +static int iosapic_kmalloc_ok; +static LIST_HEAD(free_rte_list); + +#ifdef CONFIG_XEN +static void +xen_iosapic_write (struct iosapic_rte_info *rte, unsigned char off, u32 val) +{ + if (running_on_xen) { + physdev_op_t op; + + op.cmd = PHYSDEVOP_APIC_WRITE; + op.u.apic_op.apic = rte->gsi_base + rte->rte_index; + op.u.apic_op.offset = off; + op.u.apic_op.value = val; + + HYPERVISOR_physdev_op (&op); + } + else { + iosapic_write(rte->addr, off, val); + } + +} + +static unsigned int +xen_iosapic_read (struct iosapic *iosapic, unsigned char off) +{ + if (running_on_xen) { + physdev_op_t op; + + op.cmd = PHYSDEVOP_APIC_READ; + op.u.apic_op.apic = iosapic->gsi_base; + op.u.apic_op.offset = off; + + HYPERVISOR_physdev_op (&op); + + return op.u.apic_op.value; + } + else { + return iosapic_read(iosapic->addr, off); + } + +} + +#define XEN_IOSAPIC_EOI 2 +static void +xen_iosapic_eoi (struct iosapic_rte_info *rte, unsigned int vec) +{ + if (running_on_xen) { + physdev_op_t op; + + op.cmd = PHYSDEVOP_APIC_WRITE; + op.u.apic_op.apic = rte->gsi_base + rte->rte_index; + op.u.apic_op.offset = XEN_IOSAPIC_EOI; + op.u.apic_op.value = vec; + + HYPERVISOR_physdev_op (&op); + } + else { + iosapic_eoi(rte->addr, vec); + } + +} +#endif /* CONFIG_XEN */ + +/* + * Find an IOSAPIC associated with a GSI + */ +static inline int +find_iosapic (unsigned int gsi) +{ + int i; + + for (i = 0; i < num_iosapic; i++) { + if ((unsigned) (gsi - iosapic_lists[i].gsi_base) < iosapic_lists[i].num_rte) + return i; + } + + return -1; +} + +static inline int +_gsi_to_vector (unsigned int gsi) +{ + struct iosapic_intr_info *info; + struct iosapic_rte_info *rte; + + for (info = iosapic_intr_info; info < iosapic_intr_info + IA64_NUM_VECTORS; ++info) + list_for_each_entry(rte, &info->rtes, rte_list) + if (rte->gsi_base + rte->rte_index == gsi) + return info - iosapic_intr_info; + return -1; +} + +/* + * Translate GSI number to the corresponding IA-64 interrupt vector. If no + * entry exists, return -1. + */ +inline int +gsi_to_vector (unsigned int gsi) +{ + return _gsi_to_vector(gsi); +} + +int +gsi_to_irq (unsigned int gsi) +{ + unsigned long flags; + int irq; + /* + * XXX fix me: this assumes an identity mapping vetween IA-64 vector and Linux irq + * numbers... + */ + spin_lock_irqsave(&iosapic_lock, flags); + { + irq = _gsi_to_vector(gsi); + } + spin_unlock_irqrestore(&iosapic_lock, flags); + + return irq; +} + +static struct iosapic_rte_info *gsi_vector_to_rte(unsigned int gsi, unsigned int vec) +{ + struct iosapic_rte_info *rte; + + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) + if (rte->gsi_base + rte->rte_index == gsi) + return rte; + return NULL; +} + +static void +set_rte (unsigned int gsi, unsigned int vector, unsigned int dest, int mask) +{ + unsigned long pol, trigger, dmode; + u32 low32, high32; + char __iomem *addr; + int rte_index; + char redir; + struct iosapic_rte_info *rte; + + DBG(KERN_DEBUG"IOSAPIC: routing vector %d to 0x%x\n", vector, dest); + + rte = gsi_vector_to_rte(gsi, vector); + if (!rte) + return; /* not an IOSAPIC interrupt */ + + rte_index = rte->rte_index; + addr = rte->addr; + pol = iosapic_intr_info[vector].polarity; + trigger = iosapic_intr_info[vector].trigger; + dmode = iosapic_intr_info[vector].dmode; + + redir = (dmode == IOSAPIC_LOWEST_PRIORITY) ? 1 : 0; + +#ifdef CONFIG_SMP + { + unsigned int irq; + + for (irq = 0; irq < NR_IRQS; ++irq) + if (irq_to_vector(irq) == vector) { + set_irq_affinity_info(irq, (int)(dest & 0xffff), redir); + break; + } + } +#endif + + low32 = ((pol << IOSAPIC_POLARITY_SHIFT) | + (trigger << IOSAPIC_TRIGGER_SHIFT) | + (dmode << IOSAPIC_DELIVERY_SHIFT) | + ((mask ? 1 : 0) << IOSAPIC_MASK_SHIFT) | + vector); + + /* dest contains both id and eid */ + high32 = (dest << IOSAPIC_DEST_SHIFT); + +#ifdef CONFIG_XEN + xen_iosapic_write (rte, IOSAPIC_RTE_HIGH (rte_index), high32); + xen_iosapic_write (rte, IOSAPIC_RTE_LOW (rte_index), low32); +#else + iosapic_write(addr, IOSAPIC_RTE_HIGH(rte_index), high32); + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); +#endif + iosapic_intr_info[vector].low32 = low32; + iosapic_intr_info[vector].dest = dest; +} + +static void +nop (unsigned int vector) +{ + /* do nothing... */ +} + +static void +mask_irq (unsigned int irq) +{ + unsigned long flags; + char __iomem *addr; + u32 low32; + int rte_index; + ia64_vector vec = irq_to_vector(irq); + struct iosapic_rte_info *rte; + + if (list_empty(&iosapic_intr_info[vec].rtes)) + return; /* not an IOSAPIC interrupt! */ + + spin_lock_irqsave(&iosapic_lock, flags); + { + /* set only the mask bit */ + low32 = iosapic_intr_info[vec].low32 |= IOSAPIC_MASK; + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) { + addr = rte->addr; + rte_index = rte->rte_index; + +#ifdef CONFIG_XEN + xen_iosapic_write (rte, IOSAPIC_RTE_LOW (rte_index), + low32); +#else + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); +#endif + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); +} + +static void +unmask_irq (unsigned int irq) +{ + unsigned long flags; + char __iomem *addr; + u32 low32; + int rte_index; + ia64_vector vec = irq_to_vector(irq); + struct iosapic_rte_info *rte; + + if (list_empty(&iosapic_intr_info[vec].rtes)) + return; /* not an IOSAPIC interrupt! */ + + spin_lock_irqsave(&iosapic_lock, flags); + { + low32 = iosapic_intr_info[vec].low32 &= ~IOSAPIC_MASK; + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) { + addr = rte->addr; + rte_index = rte->rte_index; +#ifdef CONFIG_XEN + xen_iosapic_write (rte, IOSAPIC_RTE_LOW (rte_index), + low32); +#else + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); +#endif + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); +} + + +static void +iosapic_set_affinity (unsigned int irq, cpumask_t mask) +{ +#ifdef CONFIG_SMP + unsigned long flags; + u32 high32, low32; + int dest, rte_index; + char __iomem *addr; + int redir = (irq & IA64_IRQ_REDIRECTED) ? 1 : 0; + ia64_vector vec; + struct iosapic_rte_info *rte; + + irq &= (~IA64_IRQ_REDIRECTED); + vec = irq_to_vector(irq); + + if (cpus_empty(mask)) + return; + + dest = cpu_physical_id(first_cpu(mask)); + + if (list_empty(&iosapic_intr_info[vec].rtes)) + return; /* not an IOSAPIC interrupt */ + + set_irq_affinity_info(irq, dest, redir); + + /* dest contains both id and eid */ + high32 = dest << IOSAPIC_DEST_SHIFT; + + spin_lock_irqsave(&iosapic_lock, flags); + { + low32 = iosapic_intr_info[vec].low32 & ~(7 << IOSAPIC_DELIVERY_SHIFT); + + if (redir) + /* change delivery mode to lowest priority */ + low32 |= (IOSAPIC_LOWEST_PRIORITY << IOSAPIC_DELIVERY_SHIFT); + else + /* change delivery mode to fixed */ + low32 |= (IOSAPIC_FIXED << IOSAPIC_DELIVERY_SHIFT); + + iosapic_intr_info[vec].low32 = low32; + iosapic_intr_info[vec].dest = dest; + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) { + addr = rte->addr; + rte_index = rte->rte_index; +#ifdef CONFIG_XEN + xen_iosapic_write + (rte, IOSAPIC_RTE_HIGH (rte_index), high32); + xen_iosapic_write + (rte, IOSAPIC_RTE_LOW (rte_index), low32); +#else + iosapic_write(addr, IOSAPIC_RTE_HIGH(rte_index), high32); + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); +#endif + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); +#endif +} + +/* + * Handlers for level-triggered interrupts. + */ + +static unsigned int +iosapic_startup_level_irq (unsigned int irq) +{ + unmask_irq(irq); + return 0; +} + +static void +iosapic_end_level_irq (unsigned int irq) +{ + ia64_vector vec = irq_to_vector(irq); + struct iosapic_rte_info *rte; + + move_irq(irq); + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) +#ifdef CONFIG_XEN + xen_iosapic_eoi (rte, vec); +#else + iosapic_eoi(rte->addr, vec); +#endif +} + +#define iosapic_shutdown_level_irq mask_irq +#define iosapic_enable_level_irq unmask_irq +#define iosapic_disable_level_irq mask_irq +#define iosapic_ack_level_irq nop + +struct hw_interrupt_type irq_type_iosapic_level = { + .typename = "IO-SAPIC-level", + .startup = iosapic_startup_level_irq, + .shutdown = iosapic_shutdown_level_irq, + .enable = iosapic_enable_level_irq, + .disable = iosapic_disable_level_irq, + .ack = iosapic_ack_level_irq, + .end = iosapic_end_level_irq, + .set_affinity = iosapic_set_affinity +}; + +/* + * Handlers for edge-triggered interrupts. + */ + +static unsigned int +iosapic_startup_edge_irq (unsigned int irq) +{ + unmask_irq(irq); + /* + * IOSAPIC simply drops interrupts pended while the + * corresponding pin was masked, so we can't know if an + * interrupt is pending already. Let's hope not... + */ + return 0; +} + +static void +iosapic_ack_edge_irq (unsigned int irq) +{ + irq_desc_t *idesc = irq_descp(irq); + + move_irq(irq); + /* + * Once we have recorded IRQ_PENDING already, we can mask the + * interrupt for real. This prevents IRQ storms from unhandled + * devices. + */ + if ((idesc->status & (IRQ_PENDING|IRQ_DISABLED)) == (IRQ_PENDING|IRQ_DISABLED)) + mask_irq(irq); +} + +#define iosapic_enable_edge_irq unmask_irq +#define iosapic_disable_edge_irq nop +#define iosapic_end_edge_irq nop + +struct hw_interrupt_type irq_type_iosapic_edge = { + .typename = "IO-SAPIC-edge", + .startup = iosapic_startup_edge_irq, + .shutdown = iosapic_disable_edge_irq, + .enable = iosapic_enable_edge_irq, + .disable = iosapic_disable_edge_irq, + .ack = iosapic_ack_edge_irq, + .end = iosapic_end_edge_irq, + .set_affinity = iosapic_set_affinity +}; + +static inline unsigned int +iosapic_version (struct iosapic *iosapic) +{ + /* + * IOSAPIC Version Register return 32 bit structure like: + * { + * unsigned int version : 8; + * unsigned int reserved1 : 8; + * unsigned int max_redir : 8; + * unsigned int reserved2 : 8; + * } + */ +#ifdef CONFIG_XEN + return xen_iosapic_read (iosapic, IOSAPIC_VERSION); +#else + return iosapic_read(iosapic->addr, IOSAPIC_VERSION); +#endif +} + +static int iosapic_find_sharable_vector (unsigned long trigger, unsigned long pol) +{ + int i, vector = -1, min_count = -1; + struct iosapic_intr_info *info; + + /* + * shared vectors for edge-triggered interrupts are not + * supported yet + */ + if (trigger == IOSAPIC_EDGE) + return -1; + + for (i = IA64_FIRST_DEVICE_VECTOR; i <= IA64_LAST_DEVICE_VECTOR; i++) { + info = &iosapic_intr_info[i]; + if (info->trigger == trigger && info->polarity == pol && + (info->dmode == IOSAPIC_FIXED || info->dmode == IOSAPIC_LOWEST_PRIORITY)) { + if (min_count == -1 || info->count < min_count) { + vector = i; + min_count = info->count; + } + } + } + if (vector < 0) + panic("%s: out of interrupt vectors!\n", __FUNCTION__); + + return vector; +} + +/* + * if the given vector is already owned by other, + * assign a new vector for the other and make the vector available + */ +static void __init +iosapic_reassign_vector (int vector) +{ + int new_vector; + + if (!list_empty(&iosapic_intr_info[vector].rtes)) { + new_vector = assign_irq_vector(AUTO_ASSIGN); + printk(KERN_INFO "Reassigning vector %d to %d\n", vector, new_vector); + memcpy(&iosapic_intr_info[new_vector], &iosapic_intr_info[vector], + sizeof(struct iosapic_intr_info)); + INIT_LIST_HEAD(&iosapic_intr_info[new_vector].rtes); + list_move(iosapic_intr_info[vector].rtes.next, &iosapic_intr_info[new_vector].rtes); + memset(&iosapic_intr_info[vector], 0, sizeof(struct iosapic_intr_info)); + iosapic_intr_info[vector].low32 = IOSAPIC_MASK; + INIT_LIST_HEAD(&iosapic_intr_info[vector].rtes); + } +} + +static struct iosapic_rte_info *iosapic_alloc_rte (void) +{ + int i; + struct iosapic_rte_info *rte; + int preallocated = 0; + + if (!iosapic_kmalloc_ok && list_empty(&free_rte_list)) { + rte = alloc_bootmem(sizeof(struct iosapic_rte_info) * NR_PREALLOCATE_RTE_ENTRIES); + if (!rte) + return NULL; + for (i = 0; i < NR_PREALLOCATE_RTE_ENTRIES; i++, rte++) + list_add(&rte->rte_list, &free_rte_list); + } + + if (!list_empty(&free_rte_list)) { + rte = list_entry(free_rte_list.next, struct iosapic_rte_info, rte_list); + list_del(&rte->rte_list); + preallocated++; + } else { + rte = kmalloc(sizeof(struct iosapic_rte_info), GFP_ATOMIC); + if (!rte) + return NULL; + } + + memset(rte, 0, sizeof(struct iosapic_rte_info)); + if (preallocated) + rte->flags |= RTE_PREALLOCATED; + + return rte; +} + +static void iosapic_free_rte (struct iosapic_rte_info *rte) +{ + if (rte->flags & RTE_PREALLOCATED) + list_add_tail(&rte->rte_list, &free_rte_list); + else + kfree(rte); +} + +static inline int vector_is_shared (int vector) +{ + return (iosapic_intr_info[vector].count > 1); +} + +static void +register_intr (unsigned int gsi, int vector, unsigned char delivery, + unsigned long polarity, unsigned long trigger) +{ + irq_desc_t *idesc; + struct hw_interrupt_type *irq_type; + int rte_index; + int index; + unsigned long gsi_base; + void __iomem *iosapic_address; + struct iosapic_rte_info *rte; + + index = find_iosapic(gsi); + if (index < 0) { + printk(KERN_WARNING "%s: No IOSAPIC for GSI %u\n", __FUNCTION__, gsi); + return; + } + + iosapic_address = iosapic_lists[index].addr; + gsi_base = iosapic_lists[index].gsi_base; + + rte = gsi_vector_to_rte(gsi, vector); + if (!rte) { + rte = iosapic_alloc_rte(); + if (!rte) { + printk(KERN_WARNING "%s: cannot allocate memory\n", __FUNCTION__); + return; + } + + rte_index = gsi - gsi_base; + rte->rte_index = rte_index; + rte->addr = iosapic_address; + rte->gsi_base = gsi_base; + rte->refcnt++; + list_add_tail(&rte->rte_list, &iosapic_intr_info[vector].rtes); + iosapic_intr_info[vector].count++; + } + else if (vector_is_shared(vector)) { + struct iosapic_intr_info *info = &iosapic_intr_info[vector]; + if (info->trigger != trigger || info->polarity != polarity) { + printk (KERN_WARNING "%s: cannot override the interrupt\n", __FUNCTION__); + return; + } + } + + iosapic_intr_info[vector].polarity = polarity; + iosapic_intr_info[vector].dmode = delivery; + iosapic_intr_info[vector].trigger = trigger; + + if (trigger == IOSAPIC_EDGE) + irq_type = &irq_type_iosapic_edge; + else + irq_type = &irq_type_iosapic_level; + + idesc = irq_descp(vector); + if (idesc->handler != irq_type) { + if (idesc->handler != &no_irq_type) + printk(KERN_WARNING "%s: changing vector %d from %s to %s\n", + __FUNCTION__, vector, idesc->handler->typename, irq_type->typename); + idesc->handler = irq_type; + } +} + +static unsigned int +get_target_cpu (unsigned int gsi, int vector) +{ +#ifdef CONFIG_SMP + static int cpu = -1; + + /* + * In case of vector shared by multiple RTEs, all RTEs that + * share the vector need to use the same destination CPU. + */ + if (!list_empty(&iosapic_intr_info[vector].rtes)) + return iosapic_intr_info[vector].dest; + + /* + * If the platform supports redirection via XTP, let it + * distribute interrupts. + */ + if (smp_int_redirect & SMP_IRQ_REDIRECTION) + return cpu_physical_id(smp_processor_id()); + + /* + * Some interrupts (ACPI SCI, for instance) are registered + * before the BSP is marked as online. + */ + if (!cpu_online(smp_processor_id())) + return cpu_physical_id(smp_processor_id()); + +#ifdef CONFIG_NUMA + { + int num_cpus, cpu_index, iosapic_index, numa_cpu, i = 0; + cpumask_t cpu_mask; + + iosapic_index = find_iosapic(gsi); + if (iosapic_index < 0 || + iosapic_lists[iosapic_index].node == MAX_NUMNODES) + goto skip_numa_setup; + + cpu_mask = node_to_cpumask(iosapic_lists[iosapic_index].node); + + for_each_cpu_mask(numa_cpu, cpu_mask) { + if (!cpu_online(numa_cpu)) + cpu_clear(numa_cpu, cpu_mask); + } + + num_cpus = cpus_weight(cpu_mask); + + if (!num_cpus) + goto skip_numa_setup; + + /* Use vector assigment to distribute across cpus in node */ + cpu_index = vector % num_cpus; + + for (numa_cpu = first_cpu(cpu_mask) ; i < cpu_index ; i++) + numa_cpu = next_cpu(numa_cpu, cpu_mask); + + if (numa_cpu != NR_CPUS) + return cpu_physical_id(numa_cpu); + } +skip_numa_setup: +#endif + /* + * Otherwise, round-robin interrupt vectors across all the + * processors. (It'd be nice if we could be smarter in the + * case of NUMA.) + */ + do { + if (++cpu >= NR_CPUS) + cpu = 0; + } while (!cpu_online(cpu)); + + return cpu_physical_id(cpu); +#else + return cpu_physical_id(smp_processor_id()); +#endif +} + +/* + * ACPI can describe IOSAPIC interrupts via static tables and namespace + * methods. This provides an interface to register those interrupts and + * program the IOSAPIC RTE. + */ +int +iosapic_register_intr (unsigned int gsi, + unsigned long polarity, unsigned long trigger) +{ + int vector, mask = 1; + unsigned int dest; + unsigned long flags; + struct iosapic_rte_info *rte; + u32 low32; +again: + /* + * If this GSI has already been registered (i.e., it's a + * shared interrupt, or we lost a race to register it), + * don't touch the RTE. + */ + spin_lock_irqsave(&iosapic_lock, flags); + { + vector = gsi_to_vector(gsi); + if (vector > 0) { + rte = gsi_vector_to_rte(gsi, vector); + rte->refcnt++; + spin_unlock_irqrestore(&iosapic_lock, flags); + return vector; + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); + + /* If vector is running out, we try to find a sharable vector */ + vector = assign_irq_vector_nopanic(AUTO_ASSIGN); + if (vector < 0) + vector = iosapic_find_sharable_vector(trigger, polarity); + + spin_lock_irqsave(&irq_descp(vector)->lock, flags); + spin_lock(&iosapic_lock); + { + if (gsi_to_vector(gsi) > 0) { + if (list_empty(&iosapic_intr_info[vector].rtes)) + free_irq_vector(vector); + spin_unlock(&iosapic_lock); + spin_unlock_irqrestore(&irq_descp(vector)->lock, flags); + goto again; + } + + dest = get_target_cpu(gsi, vector); + register_intr(gsi, vector, IOSAPIC_LOWEST_PRIORITY, + polarity, trigger); + + /* + * If the vector is shared and already unmasked for + * other interrupt sources, don't mask it. + */ + low32 = iosapic_intr_info[vector].low32; + if (vector_is_shared(vector) && !(low32 & IOSAPIC_MASK)) + mask = 0; + set_rte(gsi, vector, dest, mask); + } + spin_unlock(&iosapic_lock); + spin_unlock_irqrestore(&irq_descp(vector)->lock, flags); + + printk(KERN_INFO "GSI %u (%s, %s) -> CPU %d (0x%04x) vector %d\n", + gsi, (trigger == IOSAPIC_EDGE ? "edge" : "level"), + (polarity == IOSAPIC_POL_HIGH ? "high" : "low"), + cpu_logical_id(dest), dest, vector); + + return vector; +} + +#ifdef CONFIG_ACPI_DEALLOCATE_IRQ +void +iosapic_unregister_intr (unsigned int gsi) +{ + unsigned long flags; + int irq, vector; + irq_desc_t *idesc; + u32 low32; + unsigned long trigger, polarity; + unsigned int dest; + struct iosapic_rte_info *rte; + + /* + * If the irq associated with the gsi is not found, + * iosapic_unregister_intr() is unbalanced. We need to check + * this again after getting locks. + */ + irq = gsi_to_irq(gsi); + if (irq < 0) { + printk(KERN_ERR "iosapic_unregister_intr(%u) unbalanced\n", gsi); + WARN_ON(1); + return; + } + vector = irq_to_vector(irq); + + idesc = irq_descp(irq); + spin_lock_irqsave(&idesc->lock, flags); + spin_lock(&iosapic_lock); + { + if ((rte = gsi_vector_to_rte(gsi, vector)) == NULL) { + printk(KERN_ERR "iosapic_unregister_intr(%u) unbalanced\n", gsi); + WARN_ON(1); + goto out; + } + + if (--rte->refcnt > 0) + goto out; + + /* Mask the interrupt */ + low32 = iosapic_intr_info[vector].low32 | IOSAPIC_MASK; +#ifdef CONFIG_XEN + xen_iosapic_write (rte, IOSAPIC_RTE_LOW (rte->rte_index), + low32); +#else + iosapic_write(rte->addr, IOSAPIC_RTE_LOW(rte->rte_index), + low32); +#endif + + /* Remove the rte entry from the list */ + list_del(&rte->rte_list); + iosapic_intr_info[vector].count--; + iosapic_free_rte(rte); + + trigger = iosapic_intr_info[vector].trigger; + polarity = iosapic_intr_info[vector].polarity; + dest = iosapic_intr_info[vector].dest; + printk(KERN_INFO "GSI %u (%s, %s) -> CPU %d (0x%04x) vector %d unregistered\n", + gsi, (trigger == IOSAPIC_EDGE ? "edge" : "level"), + (polarity == IOSAPIC_POL_HIGH ? "high" : "low"), + cpu_logical_id(dest), dest, vector); + + if (list_empty(&iosapic_intr_info[vector].rtes)) { + /* Sanity check */ + BUG_ON(iosapic_intr_info[vector].count); + + /* Clear the interrupt controller descriptor */ + idesc->handler = &no_irq_type; + + /* Clear the interrupt information */ + memset(&iosapic_intr_info[vector], 0, sizeof(struct iosapic_intr_info)); + iosapic_intr_info[vector].low32 |= IOSAPIC_MASK; + INIT_LIST_HEAD(&iosapic_intr_info[vector].rtes); + + if (idesc->action) { + printk(KERN_ERR "interrupt handlers still exist on IRQ %u\n", irq); + WARN_ON(1); + } + + /* Free the interrupt vector */ + free_irq_vector(vector); + } + } + out: + spin_unlock(&iosapic_lock); + spin_unlock_irqrestore(&idesc->lock, flags); +} +#endif /* CONFIG_ACPI_DEALLOCATE_IRQ */ + +/* + * ACPI calls this when it finds an entry for a platform interrupt. + * Note that the irq_base and IOSAPIC address must be set in iosapic_init(). + */ +int __init +iosapic_register_platform_intr (u32 int_type, unsigned int gsi, + int iosapic_vector, u16 eid, u16 id, + unsigned long polarity, unsigned long trigger) +{ + static const char * const name[] = {"unknown", "PMI", "INIT", "CPEI"}; + unsigned char delivery; + int vector, mask = 0; + unsigned int dest = ((id << 8) | eid) & 0xffff; + + switch (int_type) { + case ACPI_INTERRUPT_PMI: + vector = iosapic_vector; + /* + * since PMI vector is alloc'd by FW(ACPI) not by kernel, + * we need to make sure the vector is available + */ + iosapic_reassign_vector(vector); + delivery = IOSAPIC_PMI; + break; + case ACPI_INTERRUPT_INIT: + vector = assign_irq_vector(AUTO_ASSIGN); + delivery = IOSAPIC_INIT; + break; + case ACPI_INTERRUPT_CPEI: + vector = IA64_CPE_VECTOR; + delivery = IOSAPIC_LOWEST_PRIORITY; + mask = 1; + break; + default: + printk(KERN_ERR "iosapic_register_platform_irq(): invalid int type 0x%x\n", int_type); + return -1; + } + + register_intr(gsi, vector, delivery, polarity, trigger); + + printk(KERN_INFO "PLATFORM int %s (0x%x): GSI %u (%s, %s) -> CPU %d (0x%04x) vector %d\n", + int_type < ARRAY_SIZE(name) ? name[int_type] : "unknown", + int_type, gsi, (trigger == IOSAPIC_EDGE ? "edge" : "level"), + (polarity == IOSAPIC_POL_HIGH ? "high" : "low"), + cpu_logical_id(dest), dest, vector); + + set_rte(gsi, vector, dest, mask); + return vector; +} + + +/* + * ACPI calls this when it finds an entry for a legacy ISA IRQ override. + * Note that the gsi_base and IOSAPIC address must be set in iosapic_init(). + */ +void __init +iosapic_override_isa_irq (unsigned int isa_irq, unsigned int gsi, + unsigned long polarity, + unsigned long trigger) +{ + int vector; + unsigned int dest = cpu_physical_id(smp_processor_id()); + + vector = isa_irq_to_vector(isa_irq); + + register_intr(gsi, vector, IOSAPIC_LOWEST_PRIORITY, polarity, trigger); + + DBG("ISA: IRQ %u -> GSI %u (%s,%s) -> CPU %d (0x%04x) vector %d\n", + isa_irq, gsi, trigger == IOSAPIC_EDGE ? "edge" : "level", + polarity == IOSAPIC_POL_HIGH ? "high" : "low", + cpu_logical_id(dest), dest, vector); + + set_rte(gsi, vector, dest, 1); +} + +void __init +iosapic_system_init (int system_pcat_compat) +{ + int vector; + + for (vector = 0; vector < IA64_NUM_VECTORS; ++vector) { + iosapic_intr_info[vector].low32 = IOSAPIC_MASK; + INIT_LIST_HEAD(&iosapic_intr_info[vector].rtes); /* mark as unused */ + } + + pcat_compat = system_pcat_compat; + if ( +#ifdef CONFIG_XEN + !running_on_xen && +#endif + pcat_compat) { + /* + * Disable the compatibility mode interrupts (8259 style), needs IN/OUT support + * enabled. + */ + printk(KERN_INFO "%s: Disabling PC-AT compatible 8259 interrupts\n", __FUNCTION__); + outb(0xff, 0xA1); + outb(0xff, 0x21); + } +} + +void __init +iosapic_init (unsigned long phys_addr, unsigned int gsi_base) +{ + int num_rte; + unsigned int isa_irq, ver; + char __iomem *addr; + struct iosapic *iosapic = &iosapic_lists[num_iosapic]; + + addr = ioremap(phys_addr, 0); + + iosapic->addr = addr; + iosapic->gsi_base = gsi_base; + + ver = iosapic_version(iosapic); + + /* + * The MAX_REDIR register holds the highest input pin + * number (starting from 0). + * We add 1 so that we can use it for number of pins (= RTEs) + */ + num_rte = ((ver >> 16) & 0xff) + 1; + + iosapic->num_rte = num_rte; +#ifdef CONFIG_NUMA + iosapic->node = MAX_NUMNODES; +#endif + num_iosapic++; + + if ((gsi_base == 0) && pcat_compat) { + /* + * Map the legacy ISA devices into the IOSAPIC data. Some of these may + * get reprogrammed later on with data from the ACPI Interrupt Source + * Override table. + */ + for (isa_irq = 0; isa_irq < 16; ++isa_irq) + iosapic_override_isa_irq(isa_irq, isa_irq, IOSAPIC_POL_HIGH, IOSAPIC_EDGE); + } +} + +#ifdef CONFIG_NUMA +void __init +map_iosapic_to_node(unsigned int gsi_base, int node) +{ + int index; + + index = find_iosapic(gsi_base); + if (index < 0) { + printk(KERN_WARNING "%s: No IOSAPIC for GSI %u\n", + __FUNCTION__, gsi_base); + return; + } + iosapic_lists[index].node = node; + return; +} +#endif + +static int __init iosapic_enable_kmalloc (void) +{ + iosapic_kmalloc_ok = 1; + return 0; +} +core_initcall (iosapic_enable_kmalloc); diff -r 5fcc346d6fe0 -r 3d7272deebe6 linux-2.6-xen-sparse/include/asm-ia64/iosapic.h --- /dev/null Thu Jan 26 10:31:28 2006 +++ b/linux-2.6-xen-sparse/include/asm-ia64/iosapic.h Wed Feb 1 12:17:54 2006 @@ -0,0 +1,109 @@ +#ifndef __ASM_IA64_IOSAPIC_H +#define __ASM_IA64_IOSAPIC_H + +#define IOSAPIC_REG_SELECT 0x0 +#define IOSAPIC_WINDOW 0x10 +#define IOSAPIC_EOI 0x40 + +#define IOSAPIC_VERSION 0x1 + +/* + * Redirection table entry + */ +#define IOSAPIC_RTE_LOW(i) (0x10+i*2) +#define IOSAPIC_RTE_HIGH(i) (0x11+i*2) + +#define IOSAPIC_DEST_SHIFT 16 + +/* + * Delivery mode + */ +#define IOSAPIC_DELIVERY_SHIFT 8 +#define IOSAPIC_FIXED 0x0 +#define IOSAPIC_LOWEST_PRIORITY 0x1 +#define IOSAPIC_PMI 0x2 +#define IOSAPIC_NMI 0x4 +#define IOSAPIC_INIT 0x5 +#define IOSAPIC_EXTINT 0x7 + +/* + * Interrupt polarity + */ +#define IOSAPIC_POLARITY_SHIFT 13 +#define IOSAPIC_POL_HIGH 0 +#define IOSAPIC_POL_LOW 1 + +/* + * Trigger mode + */ +#define IOSAPIC_TRIGGER_SHIFT 15 +#define IOSAPIC_EDGE 0 +#define IOSAPIC_LEVEL 1 + +/* + * Mask bit + */ + +#define IOSAPIC_MASK_SHIFT 16 +#define IOSAPIC_MASK (1< + * Copyright (C) 2000-2002 J.I. Lee + * Copyright (C) 1999-2000, 2002-2003 Hewlett-Packard Co. + * David Mosberger-Tang + * Copyright (C) 1999 VA Linux Systems + * Copyright (C) 1999,2000 Walt Drummond + * + * 00/04/19 D. Mosberger Rewritten to mirror more closely the x86 I/O APIC code. + * In particular, we now have separate handlers for edge + * and level triggered interrupts. + * 00/10/27 Asit Mallick, Goutham Rao IRQ vector allocation + * PCI to vector mapping, shared PCI interrupts. + * 00/10/27 D. Mosberger Document things a bit more to make them more understandable. + * Clean up much of the old IOSAPIC cruft. + * 01/07/27 J.I. Lee PCI irq routing, Platform/Legacy interrupts and fixes for + * ACPI S5(SoftOff) support. + * 02/01/23 J.I. Lee iosapic pgm fixes for PCI irq routing from _PRT + * 02/01/07 E. Focht Redirectable interrupt vectors in + * iosapic_set_affinity(), initializations for + * /proc/irq/#/smp_affinity + * 02/04/02 P. Diefenbaugh Cleaned up ACPI PCI IRQ routing. + * 02/04/18 J.I. Lee bug fix in iosapic_init_pci_irq + * 02/04/30 J.I. Lee bug fix in find_iosapic to fix ACPI PCI IRQ to IOSAPIC mapping + * error + * 02/07/29 T. Kochi Allocate interrupt vectors dynamically + * 02/08/04 T. Kochi Cleaned up terminology (irq, global system interrupt, vector, etc.) + * 02/09/20 D. Mosberger Simplified by taking advantage of ACPI's pci_irq code. + * 03/02/19 B. Helgaas Make pcat_compat system-wide, not per-IOSAPIC. + * Remove iosapic_address & gsi_base from external interfaces. + * Rationalize __init/__devinit attributes. + * 04/12/04 Ashok Raj Intel Corporation 2004 + * Updated to work with irq migration necessary for CPU Hotplug + */ +/* + * Here is what the interrupt logic between a PCI device and the kernel looks like: + * + * (1) A PCI device raises one of the four interrupt pins (INTA, INTB, INTC, INTD). The + * device is uniquely identified by its bus--, and slot-number (the function + * number does not matter here because all functions share the same interrupt + * lines). + * + * (2) The motherboard routes the interrupt line to a pin on a IOSAPIC controller. + * Multiple interrupt lines may have to share the same IOSAPIC pin (if they're level + * triggered and use the same polarity). Each interrupt line has a unique Global + * System Interrupt (GSI) number which can be calculated as the sum of the controller's + * base GSI number and the IOSAPIC pin number to which the line connects. + * + * (3) The IOSAPIC uses an internal routing table entries (RTEs) to map the IOSAPIC pin + * into the IA-64 interrupt vector. This interrupt vector is then sent to the CPU. + * + * (4) The kernel recognizes an interrupt as an IRQ. The IRQ interface is used as + * architecture-independent interrupt handling mechanism in Linux. As an + * IRQ is a number, we have to have IA-64 interrupt vector number <-> IRQ number + * mapping. On smaller systems, we use one-to-one mapping between IA-64 vector and + * IRQ. A platform can implement platform_irq_to_vector(irq) and + * platform_local_vector_to_irq(vector) APIs to differentiate the mapping. + * Please see also include/asm-ia64/hw_irq.h for those APIs. + * + * To sum up, there are three levels of mappings involved: + * + * PCI pin -> global system interrupt (GSI) -> IA-64 vector <-> IRQ + * + * Note: The term "IRQ" is loosely used everywhere in Linux kernel to describe interrupts. + * Now we use "IRQ" only for Linux IRQ's. ISA IRQ (isa_irq) is the only exception in this + * source code. + */ +#include + +#include +#include +#include +#include +#include +#ifndef XEN +#include +#endif +#ifdef XEN +#include +#endif +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#define DEBUG_INTERRUPT_ROUTING + +#ifdef DEBUG_INTERRUPT_ROUTING +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + +#define NR_PREALLOCATE_RTE_ENTRIES (PAGE_SIZE / sizeof(struct iosapic_rte_info)) +#define RTE_PREALLOCATED (1) + +static DEFINE_SPINLOCK(iosapic_lock); + +/* These tables map IA-64 vectors to the IOSAPIC pin that generates this vector. */ + +struct iosapic_rte_info { + struct list_head rte_list; /* node in list of RTEs sharing the same vector */ + char __iomem *addr; /* base address of IOSAPIC */ + unsigned int gsi_base; /* first GSI assigned to this IOSAPIC */ + char rte_index; /* IOSAPIC RTE index */ + int refcnt; /* reference counter */ + unsigned int flags; /* flags */ +#ifdef XEN + struct list_head iosapic_list; /* node in list of RTEs sharing the same iosapic. */ + /* Dummy vcpu. The interrupt must be sent to Xen. */ +#define VCPU_XEN ((struct vcpu *)1) + struct vcpu *vcpu; /* VCPU for this interrupts. */ + ia64_vector vcpu_vec; /* VCPU Vector. */ + ia64_vector xen_vec; /* Vector used by Xen. */ +#endif +} ____cacheline_aligned; + +static struct iosapic_intr_info { + struct list_head rtes; /* RTEs using this vector (empty => not an IOSAPIC interrupt) */ + int count; /* # of RTEs that shares this vector */ + u32 low32; /* current value of low word of Redirection table entry */ + unsigned int dest; /* destination CPU physical ID */ + unsigned char dmode : 3; /* delivery mode (see iosapic.h) */ + unsigned char polarity: 1; /* interrupt polarity (see iosapic.h) */ + unsigned char trigger : 1; /* trigger mode (see iosapic.h) */ +} iosapic_intr_info[IA64_NUM_VECTORS]; + +static struct iosapic { + char __iomem *addr; /* base address of IOSAPIC */ + unsigned int gsi_base; /* first GSI assigned to this IOSAPIC */ + unsigned short num_rte; /* number of RTE in this IOSAPIC */ +#ifdef CONFIG_NUMA + unsigned short node; /* numa node association via pxm */ +#endif +#ifdef XEN + struct list_head rtes; /* RTEs on this iosapic. */ +#endif +} iosapic_lists[NR_IOSAPICS]; + +static int num_iosapic; + +static unsigned char pcat_compat __initdata; /* 8259 compatibility flag */ + +static int iosapic_kmalloc_ok; +static LIST_HEAD(free_rte_list); + +/* + * Find an IOSAPIC associated with a GSI + */ +static inline int +find_iosapic (unsigned int gsi) +{ + int i; + + for (i = 0; i < num_iosapic; i++) { + if ((unsigned) (gsi - iosapic_lists[i].gsi_base) < iosapic_lists[i].num_rte) + return i; + } + + return -1; +} + +static inline int +_gsi_to_vector (unsigned int gsi) +{ + struct iosapic_intr_info *info; + struct iosapic_rte_info *rte; + + for (info = iosapic_intr_info; info < iosapic_intr_info + IA64_NUM_VECTORS; ++info) + list_for_each_entry(rte, &info->rtes, rte_list) + if (rte->gsi_base + rte->rte_index == gsi) + return info - iosapic_intr_info; + return -1; +} + +/* + * Translate GSI number to the corresponding IA-64 interrupt vector. If no + * entry exists, return -1. + */ +inline int +gsi_to_vector (unsigned int gsi) +{ + return _gsi_to_vector(gsi); +} + +int +gsi_to_irq (unsigned int gsi) +{ + unsigned long flags; + int irq; + /* + * XXX fix me: this assumes an identity mapping vetween IA-64 vector and Linux irq + * numbers... + */ + spin_lock_irqsave(&iosapic_lock, flags); + { + irq = _gsi_to_vector(gsi); + } + spin_unlock_irqrestore(&iosapic_lock, flags); + + return irq; +} + +static struct iosapic_rte_info *gsi_vector_to_rte(unsigned int gsi, unsigned int vec) +{ + struct iosapic_rte_info *rte; + + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) + if (rte->gsi_base + rte->rte_index == gsi) + return rte; + return NULL; +} + +static void +set_rte (unsigned int gsi, unsigned int vector, unsigned int dest, int mask) +{ + unsigned long pol, trigger, dmode; + u32 low32, high32; + char __iomem *addr; + int rte_index; + char redir; + struct iosapic_rte_info *rte; + + DBG(KERN_DEBUG"IOSAPIC: routing vector %d to 0x%x\n", vector, dest); + + rte = gsi_vector_to_rte(gsi, vector); + if (!rte) + return; /* not an IOSAPIC interrupt */ + + rte_index = rte->rte_index; + addr = rte->addr; + pol = iosapic_intr_info[vector].polarity; + trigger = iosapic_intr_info[vector].trigger; + dmode = iosapic_intr_info[vector].dmode; + + redir = (dmode == IOSAPIC_LOWEST_PRIORITY) ? 1 : 0; + +#ifdef CONFIG_SMP + { + unsigned int irq; + + for (irq = 0; irq < NR_IRQS; ++irq) + if (irq_to_vector(irq) == vector) { + set_irq_affinity_info(irq, (int)(dest & 0xffff), redir); + break; + } + } +#endif + + low32 = ((pol << IOSAPIC_POLARITY_SHIFT) | + (trigger << IOSAPIC_TRIGGER_SHIFT) | + (dmode << IOSAPIC_DELIVERY_SHIFT) | + ((mask ? 1 : 0) << IOSAPIC_MASK_SHIFT) | + vector); + + /* dest contains both id and eid */ + high32 = (dest << IOSAPIC_DEST_SHIFT); + + iosapic_write(addr, IOSAPIC_RTE_HIGH(rte_index), high32); + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); + iosapic_intr_info[vector].low32 = low32; + iosapic_intr_info[vector].dest = dest; +} + +#ifdef XEN +#undef nop +#endif + +static void +nop (unsigned int vector) +{ + /* do nothing... */ +} + +#ifdef XEN +static void +mask_vec (ia64_vector vec) +#else +static void +mask_irq (unsigned int irq) +#endif +{ + unsigned long flags; + char __iomem *addr; + u32 low32; + int rte_index; +#ifndef XEN + ia64_vector vec = irq_to_vector(irq); +#endif + struct iosapic_rte_info *rte; + + if (list_empty(&iosapic_intr_info[vec].rtes)) + return; /* not an IOSAPIC interrupt! */ + + spin_lock_irqsave(&iosapic_lock, flags); + { + /* set only the mask bit */ + low32 = iosapic_intr_info[vec].low32 |= IOSAPIC_MASK; + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) { + addr = rte->addr; + rte_index = rte->rte_index; + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); +} + +#ifdef XEN +static void +unmask_vec (ia64_vector vec) +#else +static void +unmask_irq (unsigned int irq) +#endif +{ + unsigned long flags; + char __iomem *addr; + u32 low32; + int rte_index; +#ifndef XEN + ia64_vector vec = irq_to_vector(irq); +#endif + struct iosapic_rte_info *rte; + + if (list_empty(&iosapic_intr_info[vec].rtes)) + return; /* not an IOSAPIC interrupt! */ + + spin_lock_irqsave(&iosapic_lock, flags); + { + low32 = iosapic_intr_info[vec].low32 &= ~IOSAPIC_MASK; + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) { + addr = rte->addr; + rte_index = rte->rte_index; + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); +} + +#ifdef XEN +static void +mask_irq (unsigned int irq) +{ + mask_vec (irq_to_vector(irq)); +} + +static void +unmask_irq (unsigned int irq) +{ + unmask_vec (irq_to_vector(irq)); +} +#endif + +static void +iosapic_set_affinity (unsigned int irq, cpumask_t mask) +{ +#ifdef CONFIG_SMP + unsigned long flags; + u32 high32, low32; + int dest, rte_index; + char __iomem *addr; + int redir = (irq & IA64_IRQ_REDIRECTED) ? 1 : 0; + ia64_vector vec; + struct iosapic_rte_info *rte; + + irq &= (~IA64_IRQ_REDIRECTED); + vec = irq_to_vector(irq); + + if (cpus_empty(mask)) + return; + + dest = cpu_physical_id(first_cpu(mask)); + + if (list_empty(&iosapic_intr_info[vec].rtes)) + return; /* not an IOSAPIC interrupt */ + + set_irq_affinity_info(irq, dest, redir); + + /* dest contains both id and eid */ + high32 = dest << IOSAPIC_DEST_SHIFT; + + spin_lock_irqsave(&iosapic_lock, flags); + { + low32 = iosapic_intr_info[vec].low32 & ~(7 << IOSAPIC_DELIVERY_SHIFT); + + if (redir) + /* change delivery mode to lowest priority */ + low32 |= (IOSAPIC_LOWEST_PRIORITY << IOSAPIC_DELIVERY_SHIFT); + else + /* change delivery mode to fixed */ + low32 |= (IOSAPIC_FIXED << IOSAPIC_DELIVERY_SHIFT); + + iosapic_intr_info[vec].low32 = low32; + iosapic_intr_info[vec].dest = dest; + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) { + addr = rte->addr; + rte_index = rte->rte_index; + iosapic_write(addr, IOSAPIC_RTE_HIGH(rte_index), high32); + iosapic_write(addr, IOSAPIC_RTE_LOW(rte_index), low32); + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); +#endif +} + +/* + * Handlers for level-triggered interrupts. + */ + +static unsigned int +iosapic_startup_level_irq (unsigned int irq) +{ + unmask_irq(irq); + return 0; +} + +static void +iosapic_end_level_irq (unsigned int irq) +{ + ia64_vector vec = irq_to_vector(irq); + struct iosapic_rte_info *rte; + + move_irq(irq); + list_for_each_entry(rte, &iosapic_intr_info[vec].rtes, rte_list) + iosapic_eoi(rte->addr, vec); +} + +#define iosapic_shutdown_level_irq mask_irq +#define iosapic_enable_level_irq unmask_irq +#define iosapic_disable_level_irq mask_irq +#define iosapic_ack_level_irq nop + +struct hw_interrupt_type irq_type_iosapic_level = { + .typename = "IO-SAPIC-level", + .startup = iosapic_startup_level_irq, + .shutdown = iosapic_shutdown_level_irq, + .enable = iosapic_enable_level_irq, + .disable = iosapic_disable_level_irq, + .ack = iosapic_ack_level_irq, + .end = iosapic_end_level_irq, + .set_affinity = iosapic_set_affinity +}; + +/* + * Handlers for edge-triggered interrupts. + */ + +static unsigned int +iosapic_startup_edge_irq (unsigned int irq) +{ + unmask_irq(irq); + /* + * IOSAPIC simply drops interrupts pended while the + * corresponding pin was masked, so we can't know if an + * interrupt is pending already. Let's hope not... + */ + return 0; +} + +static void +iosapic_ack_edge_irq (unsigned int irq) +{ + irq_desc_t *idesc = irq_descp(irq); + + move_irq(irq); + /* + * Once we have recorded IRQ_PENDING already, we can mask the + * interrupt for real. This prevents IRQ storms from unhandled + * devices. + */ + if ((idesc->status & (IRQ_PENDING|IRQ_DISABLED)) == (IRQ_PENDING|IRQ_DISABLED)) + mask_irq(irq); +} + +#define iosapic_enable_edge_irq unmask_irq +#define iosapic_disable_edge_irq nop +#define iosapic_end_edge_irq nop + +struct hw_interrupt_type irq_type_iosapic_edge = { + .typename = "IO-SAPIC-edge", + .startup = iosapic_startup_edge_irq, + .shutdown = iosapic_disable_edge_irq, + .enable = iosapic_enable_edge_irq, + .disable = iosapic_disable_edge_irq, + .ack = iosapic_ack_edge_irq, + .end = iosapic_end_edge_irq, + .set_affinity = iosapic_set_affinity +}; + +unsigned int +iosapic_version (char __iomem *addr) +{ + /* + * IOSAPIC Version Register return 32 bit structure like: + * { + * unsigned int version : 8; + * unsigned int reserved1 : 8; + * unsigned int max_redir : 8; + * unsigned int reserved2 : 8; + * } + */ + return iosapic_read(addr, IOSAPIC_VERSION); +} + +static int iosapic_find_sharable_vector (unsigned long trigger, unsigned long pol) +{ + int i, vector = -1, min_count = -1; + struct iosapic_intr_info *info; + + /* + * shared vectors for edge-triggered interrupts are not + * supported yet + */ + if (trigger == IOSAPIC_EDGE) + return -1; + + for (i = IA64_FIRST_DEVICE_VECTOR; i <= IA64_LAST_DEVICE_VECTOR; i++) { + info = &iosapic_intr_info[i]; + if (info->trigger == trigger && info->polarity == pol && + (info->dmode == IOSAPIC_FIXED || info->dmode == IOSAPIC_LOWEST_PRIORITY)) { + if (min_count == -1 || info->count < min_count) { + vector = i; + min_count = info->count; + } + } + } + if (vector < 0) + panic("%s: out of interrupt vectors!\n", __FUNCTION__); + + return vector; +} + +#ifdef XEN +/** + * list_move - delete from one list and add as another's head + * @list: the entry to move + * @head: the head that will precede our entry + */ +static inline void list_move(struct list_head *list, struct list_head *head) +{ + __list_del(list->prev, list->next); + list_add(list, head); +} +#endif + +/* + * if the given vector is already owned by other, + * assign a new vector for the other and make the vector available + */ +static void __init +iosapic_reassign_vector (int vector) +{ + int new_vector; +#ifdef XEN + struct iosapic_rte_info *rte; +#endif + + if (!list_empty(&iosapic_intr_info[vector].rtes)) { + new_vector = assign_irq_vector(AUTO_ASSIGN); + printk(KERN_INFO "Reassigning vector %d to %d\n", vector, new_vector); + memcpy(&iosapic_intr_info[new_vector], &iosapic_intr_info[vector], + sizeof(struct iosapic_intr_info)); + INIT_LIST_HEAD(&iosapic_intr_info[new_vector].rtes); + list_move(iosapic_intr_info[vector].rtes.next, &iosapic_intr_info[new_vector].rtes); + memset(&iosapic_intr_info[vector], 0, sizeof(struct iosapic_intr_info)); +#ifdef XEN + list_for_each_entry(rte, &iosapic_intr_info[new_vector].rtes, rte_list) + rte->xen_vec = new_vector; +#endif + iosapic_intr_info[vector].low32 = IOSAPIC_MASK; + INIT_LIST_HEAD(&iosapic_intr_info[vector].rtes); + } +} + +static struct iosapic_rte_info *iosapic_alloc_rte (void) +{ + int i; + struct iosapic_rte_info *rte; + int preallocated = 0; + + if (!iosapic_kmalloc_ok && list_empty(&free_rte_list)) { +#ifdef XEN + rte = alloc_xenheap_pages(get_order(sizeof(struct iosapic_rte_info) * NR_PREALLOCATE_RTE_ENTRIES)); +#else + rte = alloc_bootmem(sizeof(struct iosapic_rte_info) * NR_PREALLOCATE_RTE_ENTRIES); +#endif + if (!rte) + return NULL; + for (i = 0; i < NR_PREALLOCATE_RTE_ENTRIES; i++, rte++) + list_add(&rte->rte_list, &free_rte_list); + } + + if (!list_empty(&free_rte_list)) { + rte = list_entry(free_rte_list.next, struct iosapic_rte_info, rte_list); + list_del(&rte->rte_list); + preallocated++; + } else { +#ifdef XEN + rte = NULL; +#else + rte = kmalloc(sizeof(struct iosapic_rte_info), GFP_ATOMIC); +#endif + if (!rte) + return NULL; + } + + memset(rte, 0, sizeof(struct iosapic_rte_info)); + if (preallocated) + rte->flags |= RTE_PREALLOCATED; + + return rte; +} + +static void iosapic_free_rte (struct iosapic_rte_info *rte) +{ + if (rte->flags & RTE_PREALLOCATED) + list_add_tail(&rte->rte_list, &free_rte_list); +#ifndef XEN + else + kfree(rte); +#endif +} + +static inline int vector_is_shared (int vector) +{ + return (iosapic_intr_info[vector].count > 1); +} + +static void +register_intr (unsigned int gsi, int vector, unsigned char delivery, + unsigned long polarity, unsigned long trigger) +{ + irq_desc_t *idesc; + struct hw_interrupt_type *irq_type; + int rte_index; + int index; + unsigned long gsi_base; + void __iomem *iosapic_address; + struct iosapic_rte_info *rte; + + index = find_iosapic(gsi); + if (index < 0) { + printk(KERN_WARNING "%s: No IOSAPIC for GSI %u\n", __FUNCTION__, gsi); + return; + } + + iosapic_address = iosapic_lists[index].addr; + gsi_base = iosapic_lists[index].gsi_base; + + rte = gsi_vector_to_rte(gsi, vector); + if (!rte) { + rte = iosapic_alloc_rte(); + if (!rte) { + printk(KERN_WARNING "%s: cannot allocate memory\n", __FUNCTION__); + return; + } + + rte_index = gsi - gsi_base; + rte->rte_index = rte_index; + rte->addr = iosapic_address; + rte->gsi_base = gsi_base; +#ifdef XEN + rte->vcpu = NULL; + rte->vcpu_vec = 0; + rte->xen_vec = vector; + list_add_tail (&rte->iosapic_list, &iosapic_lists[index].rtes); +#endif + rte->refcnt++; + list_add_tail(&rte->rte_list, &iosapic_intr_info[vector].rtes); + iosapic_intr_info[vector].count++; + } + else if (vector_is_shared(vector)) { + struct iosapic_intr_info *info = &iosapic_intr_info[vector]; + if (info->trigger != trigger || info->polarity != polarity) { + printk (KERN_WARNING "%s: cannot override the interrupt\n", __FUNCTION__); + return; + } + } + + iosapic_intr_info[vector].polarity = polarity; + iosapic_intr_info[vector].dmode = delivery; + iosapic_intr_info[vector].trigger = trigger; + + if (trigger == IOSAPIC_EDGE) + irq_type = &irq_type_iosapic_edge; + else + irq_type = &irq_type_iosapic_level; + + idesc = irq_descp(vector); + if (idesc->handler != irq_type) { + if (idesc->handler != &no_irq_type) + printk(KERN_WARNING "%s: changing vector %d from %s to %s\n", + __FUNCTION__, vector, idesc->handler->typename, irq_type->typename); + idesc->handler = irq_type; + } +} + +static unsigned int +get_target_cpu (unsigned int gsi, int vector) +{ +#ifdef CONFIG_SMP + static int cpu = -1; + + /* + * In case of vector shared by multiple RTEs, all RTEs that + * share the vector need to use the same destination CPU. + */ + if (!list_empty(&iosapic_intr_info[vector].rtes)) + return iosapic_intr_info[vector].dest; + + /* + * If the platform supports redirection via XTP, let it + * distribute interrupts. + */ + if (smp_int_redirect & SMP_IRQ_REDIRECTION) + return cpu_physical_id(smp_processor_id()); + + /* + * Some interrupts (ACPI SCI, for instance) are registered + * before the BSP is marked as online. + */ + if (!cpu_online(smp_processor_id())) + return cpu_physical_id(smp_processor_id()); + +#ifdef CONFIG_NUMA + { + int num_cpus, cpu_index, iosapic_index, numa_cpu, i = 0; + cpumask_t cpu_mask; + + iosapic_index = find_iosapic(gsi); + if (iosapic_index < 0 || + iosapic_lists[iosapic_index].node == MAX_NUMNODES) + goto skip_numa_setup; + + cpu_mask = node_to_cpumask(iosapic_lists[iosapic_index].node); + + for_each_cpu_mask(numa_cpu, cpu_mask) { + if (!cpu_online(numa_cpu)) + cpu_clear(numa_cpu, cpu_mask); + } + + num_cpus = cpus_weight(cpu_mask); + + if (!num_cpus) + goto skip_numa_setup; + + /* Use vector assigment to distribute across cpus in node */ + cpu_index = vector % num_cpus; + + for (numa_cpu = first_cpu(cpu_mask) ; i < cpu_index ; i++) + numa_cpu = next_cpu(numa_cpu, cpu_mask); + + if (numa_cpu != NR_CPUS) + return cpu_physical_id(numa_cpu); + } +skip_numa_setup: +#endif + /* + * Otherwise, round-robin interrupt vectors across all the + * processors. (It'd be nice if we could be smarter in the + * case of NUMA.) + */ + do { + if (++cpu >= NR_CPUS) + cpu = 0; + } while (!cpu_online(cpu)); + + return cpu_physical_id(cpu); +#else + return cpu_physical_id(smp_processor_id()); +#endif +} + +/* + * ACPI can describe IOSAPIC interrupts via static tables and namespace + * methods. This provides an interface to register those interrupts and + * program the IOSAPIC RTE. + */ +int +iosapic_register_intr (unsigned int gsi, + unsigned long polarity, unsigned long trigger) +{ + int vector, mask = 1; + unsigned int dest; + unsigned long flags; + struct iosapic_rte_info *rte; + u32 low32; +again: + /* + * If this GSI has already been registered (i.e., it's a + * shared interrupt, or we lost a race to register it), + * don't touch the RTE. + */ + spin_lock_irqsave(&iosapic_lock, flags); + { + vector = gsi_to_vector(gsi); + if (vector > 0) { + rte = gsi_vector_to_rte(gsi, vector); + rte->refcnt++; + spin_unlock_irqrestore(&iosapic_lock, flags); + return vector; + } + } + spin_unlock_irqrestore(&iosapic_lock, flags); + + /* If vector is running out, we try to find a sharable vector */ +#ifdef XEN + vector = assign_irq_vector(AUTO_ASSIGN); +#else + vector = assign_irq_vector_nopanic(AUTO_ASSIGN); +#endif + if (vector < 0) + vector = iosapic_find_sharable_vector(trigger, polarity); + + spin_lock_irqsave(&irq_descp(vector)->lock, flags); + spin_lock(&iosapic_lock); + { + if (gsi_to_vector(gsi) > 0) { + if (list_empty(&iosapic_intr_info[vector].rtes)) + free_irq_vector(vector); + spin_unlock(&iosapic_lock); + spin_unlock_irqrestore(&irq_descp(vector)->lock, flags); + goto again; + } + + dest = get_target_cpu(gsi, vector); + register_intr(gsi, vector, IOSAPIC_LOWEST_PRIORITY, + polarity, trigger); + + /* + * If the vector is shared and already unmasked for + * other interrupt sources, don't mask it. + */ + low32 = iosapic_intr_info[vector].low32; + if (vector_is_shared(vector) && !(low32 & IOSAPIC_MASK)) + mask = 0; + set_rte(gsi, vector, dest, mask); + } + spin_unlock(&iosapic_lock); + spin_unlock_irqrestore(&irq_descp(vector)->lock, flags); + + printk(KERN_INFO "GSI %u (%s, %s) -> CPU %d (0x%04x) vector %d\n", + gsi, (trigger == IOSAPIC_EDGE ? "edge" : "level"), + (polarity == IOSAPIC_POL_HIGH ? "high" : "low"), + cpu_logical_id(dest), dest, vector); + + return vector; +} + +#ifdef CONFIG_ACPI_DEALLOCATE_IRQ +void +iosapic_unregister_intr (unsigned int gsi) +{ + unsigned long flags; + int irq, vector; + irq_desc_t *idesc; + u32 low32; + unsigned long trigger, polarity; + unsigned int dest; + struct iosapic_rte_info *rte; + + /* + * If the irq associated with the gsi is not found, + * iosapic_unregister_intr() is unbalanced. We need to check + * this again after getting locks. + */ + irq = gsi_to_irq(gsi); + if (irq < 0) { + printk(KERN_ERR "iosapic_unregister_intr(%u) unbalanced\n", gsi); + WARN_ON(1); + return; + } + vector = irq_to_vector(irq); + + idesc = irq_descp(irq); + spin_lock_irqsave(&idesc->lock, flags); + spin_lock(&iosapic_lock); + { + if ((rte = gsi_vector_to_rte(gsi, vector)) == NULL) { + printk(KERN_ERR "iosapic_unregister_intr(%u) unbalanced\n", gsi); + WARN_ON(1); + goto out; + } + + if (--rte->refcnt > 0) + goto out; + + /* Mask the interrupt */ + low32 = iosapic_intr_info[vector].low32 | IOSAPIC_MASK; + iosapic_write(rte->addr, IOSAPIC_RTE_LOW(rte->rte_index), low32); + + /* Remove the rte entry from the list */ + list_del(&rte->rte_list); +#ifdef XEN + list_del(&rte->iosapic_list); +#endif + iosapic_intr_info[vector].count--; + iosapic_free_rte(rte); + + trigger = iosapic_intr_info[vector].trigger; + polarity = iosapic_intr_info[vector].polarity; + dest = iosapic_intr_info[vector].dest; + printk(KERN_INFO "GSI %u (%s, %s) -> CPU %d (0x%04x) vector %d unregistered\n", + gsi, (trigger == IOSAPIC_EDGE ? "edge" : "level"), + (polarity == IOSAPIC_POL_HIGH ? "high" : "low"), + cpu_logical_id(dest), dest, vector); + + if (list_empty(&iosapic_intr_info[vector].rtes)) { + /* Sanity check */ + BUG_ON(iosapic_intr_info[vector].count); + + /* Clear the interrupt controller descriptor */ + idesc->handler = &no_irq_type; + + /* Clear the interrupt information */ + memset(&iosapic_intr_info[vector], 0, sizeof(struct iosapic_intr_info)); + iosapic_intr_info[vector].low32 |= IOSAPIC_MASK; + INIT_LIST_HEAD(&iosapic_intr_info[vector].rtes); + + if (idesc->action) { + printk(KERN_ERR "interrupt handlers still exist on IRQ %u\n", irq); + WARN_ON(1); + } + + /* Free the interrupt vector */ + free_irq_vector(vector); + } + } + out: + spin_unlock(&iosapic_lock); + spin_unlock_irqrestore(&idesc->lock, flags); +} +#endif /* CONFIG_ACPI_DEALLOCATE_IRQ */ + +/* + * ACPI calls this when it finds an entry for a platform interrupt. + * Note that the irq_base and IOSAPIC address must be set in iosapic_init(). + */ +int __init +iosapic_register_platform_intr (u32 int_type, unsigned int gsi, + int iosapic_vector, u16 eid, u16 id, + unsigned long polarity, unsigned long trigger) +{ + static const char * const name[] = {"unknown", "PMI", "INIT", "CPEI"}; + unsigned char delivery; + int vector, mask = 0; + unsigned int dest = ((id << 8) | eid) & 0xffff; + + switch (int_type) { + case ACPI_INTERRUPT_PMI: + vector = iosapic_vector; + /* + * since PMI vector is alloc'd by FW(ACPI) not by kernel, + * we need to make sure the vector is available + */ + iosapic_reassign_vector(vector); + delivery = IOSAPIC_PMI; + break; + case ACPI_INTERRUPT_INIT: + vector = assign_irq_vector(AUTO_ASSIGN); + delivery = IOSAPIC_INIT; + break; + case ACPI_INTERRUPT_CPEI: + vector = IA64_CPE_VECTOR; + delivery = IOSAPIC_LOWEST_PRIORITY; + mask = 1; + break; + default: + printk(KERN_ERR "iosapic_register_platform_irq(): invalid int type 0x%x\n", int_type); + return -1; + } + + register_intr(gsi, vector, delivery, polarity, trigger); + + printk(KERN_INFO "PLATFORM int %s (0x%x): GSI %u (%s, %s) -> CPU %d (0x%04x) vector %d\n", + int_type < ARRAY_SIZE(name) ? name[int_type] : "unknown", + int_type, gsi, (trigger == IOSAPIC_EDGE ? "edge" : "level"), + (polarity == IOSAPIC_POL_HIGH ? "high" : "low"), + cpu_logical_id(dest), dest, vector); + + set_rte(gsi, vector, dest, mask); + return vector; +} + + +/* + * ACPI calls this when it finds an entry for a legacy ISA IRQ override. + * Note that the gsi_base and IOSAPIC address must be set in iosapic_init(). + */ +void __init +iosapic_override_isa_irq (unsigned int isa_irq, unsigned int gsi, + unsigned long polarity, + unsigned long trigger) +{ + int vector; + unsigned int dest = cpu_physical_id(smp_processor_id()); + + vector = isa_irq_to_vector(isa_irq); + + register_intr(gsi, vector, IOSAPIC_LOWEST_PRIORITY, polarity, trigger); + + DBG("ISA: IRQ %u -> GSI %u (%s,%s) -> CPU %d (0x%04x) vector %d\n", + isa_irq, gsi, trigger == IOSAPIC_EDGE ? "edge" : "level", + polarity == IOSAPIC_POL_HIGH ? "high" : "low", + cpu_logical_id(dest), dest, vector); + + set_rte(gsi, vector, dest, 1); +} + +void __init +iosapic_system_init (int system_pcat_compat) +{ + int vector; + + for (vector = 0; vector < IA64_NUM_VECTORS; ++vector) { + iosapic_intr_info[vector].low32 = IOSAPIC_MASK; + INIT_LIST_HEAD(&iosapic_intr_info[vector].rtes); /* mark as unused */ + } + + pcat_compat = system_pcat_compat; + if (pcat_compat) { + /* + * Disable the compatibility mode interrupts (8259 style), needs IN/OUT support + * enabled. + */ + printk(KERN_INFO "%s: Disabling PC-AT compatible 8259 interrupts\n", __FUNCTION__); + outb(0xff, 0xA1); + outb(0xff, 0x21); + } +} + +void __init +iosapic_init (unsigned long phys_addr, unsigned int gsi_base) +{ + int num_rte; + unsigned int isa_irq, ver; + char __iomem *addr; + + addr = ioremap(phys_addr, 0); + ver = iosapic_version(addr); + + /* + * The MAX_REDIR register holds the highest input pin + * number (starting from 0). + * We add 1 so that we can use it for number of pins (= RTEs) + */ + num_rte = ((ver >> 16) & 0xff) + 1; + + iosapic_lists[num_iosapic].addr = addr; + iosapic_lists[num_iosapic].gsi_base = gsi_base; + iosapic_lists[num_iosapic].num_rte = num_rte; +#ifdef CONFIG_NUMA + iosapic_lists[num_iosapic].node = MAX_NUMNODES; +#endif +#ifdef XEN + INIT_LIST_HEAD(&iosapic_lists[num_iosapic].rtes); +#endif + num_iosapic++; + + if ((gsi_base == 0) && pcat_compat) { + /* + * Map the legacy ISA devices into the IOSAPIC data. Some of these may + * get reprogrammed later on with data from the ACPI Interrupt Source + * Override table. + */ + for (isa_irq = 0; isa_irq < 16; ++isa_irq) + iosapic_override_isa_irq(isa_irq, isa_irq, IOSAPIC_POL_HIGH, IOSAPIC_EDGE); + } +} + +#ifdef CONFIG_NUMA +void __init +map_iosapic_to_node(unsigned int gsi_base, int node) +{ + int index; + + index = find_iosapic(gsi_base); + if (index < 0) { + printk(KERN_WARNING "%s: No IOSAPIC for GSI %u\n", + __FUNCTION__, gsi_base); + return; + } + iosapic_lists[index].node = node; + return; +} +#endif + +#ifndef XEN +static int __init iosapic_enable_kmalloc (void) +{ + iosapic_kmalloc_ok = 1; + return 0; +} +core_initcall (iosapic_enable_kmalloc); +#endif + +#ifdef XEN + +/* Utility function to dump IOSAPIC registers. */ +static void +dump_iosapic (char *addr) +{ + u32 version; + int j; + u32 val; + int num_rte; + + version = iosapic_version (addr); + num_rte = (version >> 16) & 0xff; + printf ("IOSAPIC %p num_rte=%d version=%02x\n", + addr, num_rte, version & 0xff); + for (j = 0; j < num_rte; j++) { + val = iosapic_read(addr, IOSAPIC_RTE_LOW(j)); + printf (" %d: vec=%02x dm=%x S=%d P=%d T=%d M=%d", + j, val & 0xff, (val >> 8) & 7, + (val >> 12) & 1, (val >> 13) & 1, + (val >> 15) & 1, (val >> 16) & 1); + val = iosapic_read(addr, IOSAPIC_RTE_HIGH(j)); + printf (" ID=%02x EID=%02x\n", + (val >> 24) & 0xff, (val >> 16) & 0xff); + } +} + +/* IOSAPIC virtualization. + + The current implementation follow the transparent virtualisation line. + The changes in the Linux kernel are very small: + * the kernel still uses ACPI tables to get IOSAPIC. + * the kernel can read IOSAPIC (it does only for version and entries num) + * the kernel can write IOSAPIC only through hypercalls. + + Xen is then responsible to map interrupts to vcpu. + However, there are potentially many problems: + * can an IRQ be shared between domains ? + * how to allocate Xen interrupts (mapping IRQ to vector) ? + * which logical processor does receive the interrupt ? + * what should we do when vectors are exhausted ? + * how to know which vcpu has allocated an IRQ ? + * how to avoid conflicts between Xen and a VCPU (eg serial interrupt) ? + + Fortunatly most of these issues are moot since only domain0 (UP) can use + IRQs. + + EOI: + Who is responsible to write the EOI register ? + * Currently Xen does it after delivering the interrupt. As a consequence + the same interrupt can be received before the previous one was handled and + this could result in interrupt storm. However this never due to IOSAPIC + EOI. + * If the domain enable the write to the EOI register, then other interrupts + can be masked. +*/ + +int +xen_reflect_interrupt_to_domains (ia64_vector vector) +{ + struct iosapic_intr_info *info = &iosapic_intr_info[vector]; + struct iosapic_rte_info *rte; + int res = 1; + + list_for_each_entry(rte, &info->rtes, rte_list) { + if (rte->vcpu != NULL) { + if (rte->vcpu == VCPU_XEN) + res = 0; + else { + /* printf ("Send %d to vcpu as %d\n", + vector, rte->vec); */ + /* FIXME: vcpus should be really + interrupted. This should currently works + because only domain0 receive interrupts and + domain0 runs on CPU#0, which receives all + the interrupts... */ + vcpu_pend_interrupt(rte->vcpu, rte->vcpu_vec); + vcpu_wake(rte->vcpu); + } + } + } + return res; +} + +static struct iosapic * +guest_gsi_to_iosapic (u32 gsi) +{ + int i; + + i = find_iosapic (gsi); + if (i < 0) { + printf ("guest_gsi_to_iosapic: unknown iosapic (gsi=%u)\n", + gsi); + return NULL; + } + else { + return &iosapic_lists[i]; + } +} + +static struct iosapic_rte_info * +iosapic_gsi_to_rte (struct iosapic *iosapic, u32 gsi) +{ + struct iosapic_rte_info *rte; + + list_for_each_entry (rte, &iosapic->rtes, iosapic_list) + if (gsi == (iosapic->gsi_base + rte->rte_index)) + return rte; + return NULL; +} + +#define XEN_IOSAPIC_EOI 2 +#define IOSAPIC_RTE_BASE 0x10 + +static int +iosapic_guest_write (u32 gsi, int offset, u32 val) +{ + unsigned long flags; + struct iosapic *iosapic; + struct iosapic_rte_info *rte; + struct iosapic_intr_info *intr; + +#if 0 + printf ("iosapic_guest_write: gsi=%d offset=%04x val=%08x\n", + gsi, offset, val); +#endif + + /* Find the corresponding IOSAPIC. + No need to lock. */ + iosapic = guest_gsi_to_iosapic (gsi); + if (iosapic == NULL) + return -1; + + if (offset == XEN_IOSAPIC_EOI) { + spin_lock_irqsave(&iosapic_lock, flags); + list_for_each_entry (rte, &iosapic->rtes, iosapic_list) + if (rte->vcpu == current && rte->vcpu_vec == val) + break; + if (rte != NULL) + iosapic_eoi (rte->addr, rte->xen_vec); + spin_unlock_irqrestore(&iosapic_lock, flags); + + if (rte == NULL) { + printf ("iosapic guest eoi: bad interrupt %d\n", val); + return -1; + } + + return 0; + } + else if (offset < IOSAPIC_RTE_BASE) { + /* Bad offset. */ + printf ("Bad offset for iosapic guest write\n"); + return -1; + } + + /* Check consistency. */ + offset -= IOSAPIC_RTE_BASE; + if ((offset >> 1) != (gsi - iosapic->gsi_base)) { + printf ("Bad offset for iosapic guest write\n"); + return -1; + } + + /* Higher bits are used as destination only. + This is currently discarded, since domains are not yet SMP. + FIXME: for later. */ + if ((offset & 1)) + return 0; + + /* Find RTE. */ + spin_lock_irqsave(&iosapic_lock, flags); + rte = iosapic_gsi_to_rte (iosapic, gsi); + + if (rte == NULL) { + int vec; + + spin_unlock_irqrestore(&iosapic_lock, flags); + + /* No RTE. */ + if (val & IOSAPIC_MASK) { + /* Interrupt is masked. Do not add interrupt. */ + return 0; + } + /* Interrupt is not masked. Add interrupt. */ + printf ("Request for GSI %d\n", gsi); + vec = iosapic_register_intr + (gsi, + (val >> IOSAPIC_POLARITY_SHIFT) & 1, + (val >> IOSAPIC_TRIGGER_SHIFT) & 1); + + spin_lock_irqsave(&iosapic_lock, flags); + + rte = gsi_vector_to_rte (gsi, vec); + } + + intr = &iosapic_intr_info[rte->xen_vec]; + + if ((val & IOSAPIC_MASK)) { + /* Interrupt is masked. */ + if (intr->low32 & IOSAPIC_MASK) { + /* Already masked. */ + spin_unlock_irqrestore(&iosapic_lock, flags); + } + else if (rte->vcpu->domain != current->domain) { + /* Not from the same domain. */ + spin_unlock_irqrestore(&iosapic_lock, flags); + printf ("Oops: GSI %d is disabled by another domain\n", + gsi); + } + else { + /* Not masked: write IOSAPIC. */ + rte->vcpu = NULL; + rte->vcpu_vec = 0; + spin_unlock_irqrestore(&iosapic_lock, flags); + mask_vec (rte->xen_vec); + } + } + else { + /* Interrupt is unmasked. */ + if (intr->low32 & IOSAPIC_MASK) { + /* And was masked. */ + rte->vcpu = current; + rte->vcpu_vec = val & 0xff; + spin_unlock_irqrestore(&iosapic_lock, flags); + unmask_vec (rte->xen_vec); + } + else { + /* Was already unmasked. */ + if (rte->vcpu->domain != current->domain) { + spin_unlock_irqrestore(&iosapic_lock, flags); + printf ("Oops: GSI %d is reenabled by another" + " domain\n", gsi); + } + } + } + + return 0; +} + +static int +iosapic_guest_read (u32 gsi, int offset, u32 *val) +{ + unsigned long flags; + struct iosapic *iosapic; + +#if 0 + printf ("iosapic_guest_read: gsi=%d offset=%04x\n", gsi, offset); +#endif + + /* Find the corresponding IOSAPIC. + No need to lock. */ + iosapic = guest_gsi_to_iosapic (gsi); + if (iosapic == NULL) + return -1; + + if (offset == IOSAPIC_VERSION) { + spin_lock_irqsave(&iosapic_lock, flags); + *val = iosapic_read(iosapic->addr, offset); + spin_unlock_irqrestore(&iosapic_lock, flags); + return 0; + } + + printf ("iosapic guest read: bad offset %u\n", offset); + return -1; +} + +/* + * Demuxing hypercall. + */ +long do_physdev_op(physdev_op_t *uop) +{ + physdev_op_t op; + long ret; + + if ( unlikely(copy_from_user(&op, uop, sizeof(op)) != 0) ) + return -EFAULT; + + switch ( op.cmd ) + { + case PHYSDEVOP_APIC_WRITE: + ret = -EPERM; + if ( !IS_PRIV(current->domain) ) + break; + ret = iosapic_guest_write(op.u.apic_op.apic, + op.u.apic_op.offset, + op.u.apic_op.value); + /* Do not copy back. */ + return ret; + + case PHYSDEVOP_APIC_READ: + ret = -EPERM; + if ( !IS_PRIV(current->domain) ) + break; + ret = iosapic_guest_read(op.u.apic_op.apic, + op.u.apic_op.offset, + &op.u.apic_op.value); + break; + + default: + printf ("Unhandled physdev op %d\n", op.cmd); + /* Do not copy back. */ + return -EINVAL; + } + + if ( copy_to_user(uop, &op, sizeof(op)) ) + ret = -EFAULT; + + return ret; +} +#endif diff -r 5fcc346d6fe0 -r 3d7272deebe6 xen/include/asm-ia64/linux/asm/iosapic.h --- /dev/null Thu Jan 26 10:31:28 2006 +++ b/xen/include/asm-ia64/linux/asm/iosapic.h Wed Feb 1 12:17:54 2006 @@ -0,0 +1,110 @@ +#ifndef __ASM_IA64_IOSAPIC_H +#define __ASM_IA64_IOSAPIC_H + +#define IOSAPIC_REG_SELECT 0x0 +#define IOSAPIC_WINDOW 0x10 +#define IOSAPIC_EOI 0x40 + +#define IOSAPIC_VERSION 0x1 + +/* + * Redirection table entry + */ +#define IOSAPIC_RTE_LOW(i) (0x10+i*2) +#define IOSAPIC_RTE_HIGH(i) (0x11+i*2) + +#define IOSAPIC_DEST_SHIFT 16 + +/* + * Delivery mode + */ +#define IOSAPIC_DELIVERY_SHIFT 8 +#define IOSAPIC_FIXED 0x0 +#define IOSAPIC_LOWEST_PRIORITY 0x1 +#define IOSAPIC_PMI 0x2 +#define IOSAPIC_NMI 0x4 +#define IOSAPIC_INIT 0x5 +#define IOSAPIC_EXTINT 0x7 + +/* + * Interrupt polarity + */ +#define IOSAPIC_POLARITY_SHIFT 13 +#define IOSAPIC_POL_HIGH 0 +#define IOSAPIC_POL_LOW 1 + +/* + * Trigger mode + */ +#define IOSAPIC_TRIGGER_SHIFT 15 +#define IOSAPIC_EDGE 0 +#define IOSAPIC_LEVEL 1 + +/* + * Mask bit + */ + +#define IOSAPIC_MASK_SHIFT 16 +#define IOSAPIC_MASK (1<