WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH]Add MSI support to PV_dom0

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH]Add MSI support to PV_dom0
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 12 May 2009 13:05:51 -0700
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Matthew Wilcox <matthew@xxxxxx>
Delivery-date: Tue, 12 May 2009 13:06:24 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <E2263E4A5B2284449EEBD0AAB751098401D1F443F2@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098401D1F443F2@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.21 (X11/20090320)
Jiang, Yunhong wrote:
Jeremy, attached is basic MSI support to PV_dom0.  Please have a look on it.

Thanks
Yunhong Jiang

The pci_wrapper.patch add some hook to pci_bus_type, so that Xen will be 
notified when a PCI device is added to system.
 Makefile   |    2 -
 pci_wrap.c |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

The enable_msi.patch add the MSI support to dom0, basically it just allocate 
irq from Xen irq name space.

 arch/x86/include/asm/xen/pci.h  |   17 +++++++
 arch/x86/kernel/apic/io_apic.c  |   22 +++++++--
 arch/x86/xen/apic.c             |    1
 drivers/xen/events.c            |   91 +++++++++++++++++++++++++++++++++++++++-
 include/xen/interface/physdev.h |   59 +++++++++++++++++++++++++
 5 files changed, 182 insertions(+), 8 deletions(-)

The reenable_msi.patch just remove original function that disable MSI in xen 
environment.

 arch/x86/xen/apic.c |    2 --
 drivers/pci/pci.h   |    2 ++
 include/linux/pci.h |    6 ------
 3 files changed, 2 insertions(+), 8 deletions(-)


Notice:
a) Currently the PCI save/restore is broken. The reason is because pci_restore_msi_state() 
in "drivers/pci/msi.c" will try to restore the MSI config depends on device's msi 
msg information (i.e. content of msi_desc->msg). However, that information is wrong in 
Xen environment because Xen HV owns MSI. I'm still trying to find a method to achieve the 
save/restore without touch the common msi.c code, any suggestion is welcome.

You mean because it tries to directly write to pci config space, or that it writes the wrong thing there?

What will this affect?  Dom0 S3 suspend/resume?  More?

b) I notice pci frontend is in a branch. But to support pci frontend without 
touch common PCI function is difficult, I'm still considering it.

OK.

c) I'm not sure if xen's irq space should be same as pirq. Basically I think 
irq is dom0 internal structure, while PIRQ is interface between Xen HV/dom0. 
But seems current implementation think these two items are same. I didn't try 
to change it, but that may need improvement.

What do you mean by "Xen's irq space"? Does it have an irq space that's distinct from pirqs? Or do you mean "Linux's irq space"? At the moment, a Linux irq == the gsi, which is a convention I kept for Xen interrupts, though there's no very strong tie (identity_mapped_irq() determines where we bother with the identity mapping; only the legacy ISA interrupts are essential). Obviously this has no meaning with MSI interrupts.

Or have I misunderstood you?

(It would be easier to review the patches if they were inline, one per mail)

commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> Date: Fri May 8 00:50:34 2009 +0800 Add MSI support to Xen Dom0 It add some hook to arch specific MSI function, so that it will a) allocate irq from Xen's irq space b) allocate vector through Xen's map_irq hypercall. Currently Xen's irq space function has no chip_data function, I assume this should be ok since we Xen's IRQ is different chip with native IOAPIC. Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h index 0563fc6..a5d9027 100644 --- a/arch/x86/include/asm/xen/pci.h +++ b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned int want, + struct msi_desc *msidesc, + int type); +int xen_destroy_irq(int irq); #else static inline int xen_register_gsi(u32 gsi, int triggering, int polarity) { return -1; } + +int xen_create_msi_irq(struct pci_dev *dev,
static inline
+ unsigned int want, + struct msi_desc *msidesc, + int type) +{ + return -1; +} +int xen_destroy_irq(int irq)
static inline
+{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index b562550..d1e3408 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include <asm/xen/hypervisor.h> #include <asm/apic.h> +#include <asm/xen/pci.h> #define __apicdebuginit(type) static type __init @@ -3472,6 +3473,10 @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq) return ret; set_irq_msi(irq, msidesc); + + if (xen_domain()) + return 0;

I think it would be cleaner to have a xen_setup_msi_irq() which just does the ret = msi_compose_msg(dev, irq, &msg);
        if (ret < 0)
                return ret;

        set_irq_msi(irq, msidesc);

parts then returns, and put the if (xen) logic at the callsite (either with an 
explicit
test, or add an ops vec of some kind.

+ write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9 +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) irq_want = nr_irqs_gsi; sub_handle = 0; list_for_each_entry(msidesc, &dev->msi_list, list) { - irq = create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq = xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { + dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else { + irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + }
Hoist this out into a create_msi_irq() which does the native vs Xen thing
appropriately.
irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@ -3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if (xen_destroy_irq(irq)) + destroy_irq(irq);
Hoist this out too.  (destroy_msi_irq()?)
return ret; }

Do we (or will we ever) support the interrupt remapping stuff in the dom0 
kernel,
or is that something that might happen within Xen?  (I don't know anything about
it.)

If not, then it looks like there isn't very much common code between the native
and Xen versions of arch_setup_msi_irqs().  I think it might be overall cleaner
just to explicitly have two versions, with arch_setup_msi_irqs() being a 
wrapper to
choose which one to use.  Even if we do support interrupt remapping, it wouldn't
result in much duplicated code at all, as far as I can see.
void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); + if (xen_destroy_irq(irq)) + destroy_irq(irq);

Yes, this little pattern should be put into a single place.

} #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7 +1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include <linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h> #include <asm/acpi.h> diff --git a/drivers/xen/events.c b/drivers/xen/events.c index af2aad4..65e4c7a 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@ #include <linux/string.h> #include <linux/bootmem.h> #include <linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h> +#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h> @@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name) if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + + irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: + spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; + int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq; + unmap_irq.domid = DOMID_SELF; + if ((rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { + printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } - irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq] = mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: - spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int xen_create_msi_irq(struct pci_dev *dev, + unsigned int want,

Do we need this param?  Looks unused.

+ struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; + int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; + + memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; + map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus = dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type == PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base) (base + 0x04)
Isn't this defined somewhere else?  If not, is there a better place to define 
it?
+ pci_read_config_dword(dev, msix_table_offset_reg(pos), &table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK); + + map_irq.table_base = pci_resource_start(dev, bir); + map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + + spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); + + if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { + printk(KERN_WARNING "xen map irq failed %x\n", rc); + dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + + irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); + set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + handle_level_irq, + (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); + +out: + spin_unlock(&irq_mapping_update_lock); return irq; } diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index cd69391..47ab7f7 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + /* IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t domid; + /* IN */ + int pirq; +}; + +#define PHYSDEVOP_manage_pci_add 15 +#define PHYSDEVOP_manage_pci_remove 16 +struct physdev_manage_pci { + /* IN */ + uint8_t bus; + uint8_t devfn; +}; + +#define PHYSDEVOP_restore_msi 19 +struct physdev_restore_msi { + /* IN */ + uint8_t bus; + uint8_t devfn; +}; + +#define PHYSDEVOP_manage_pci_add_ext 20 +struct physdev_manage_pci_ext { + /* IN */ + uint8_t bus; + uint8_t devfn; + unsigned is_extfn; + unsigned is_virtfn; + struct { + uint8_t bus; + uint8_t devfn; + } physfn; +}; + /* * Notify that some PIRQ-bound event channels have been unmasked. * ** This command is obsolete since interface version 0x00030202 and is **


commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> Date: Thu May 7 21:22:26 2009 +0800 Add the pci wrap function, to notify Xen of all PCI information. Currently Xen depends on Dom0 to notify all PCI information. When Xen try to setup MSI for a pci device, it will depends on this information. This method need more discussion, since it add some ugly hook to pci_bus_type.

Yes, this is a bit unpleasant.  I can't see any immediately satisfying 
solution, partly because
I don't fully understand what needs to be done in these hooks.  Why does it 
need to do this at probe
time rather than when setting up the interrupts?

Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f0d1a89..372ad9e 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o apic.o obj-$(CONFIG_PCI_XEN) += pci-swiotlb.o -obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at end of file +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff --git a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file mode 100644 index 0000000..4ab6966 --- /dev/null +++ b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + * vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/pci.h> +#include <xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include <asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device *dev); +static int (*pci_bus_remove)(struct device *dev);

I'd give these more distinctive names: orig_pci_bus_probe, or something,
and call them with (*orig_pci_bus_probe)(...) to make it obvious they're
function callers.  I overlooked the calls the first couple of times because
they didn't stand out.

+static int pci_ari_enabled(struct pci_bus *bus) +{ + return bus->self && bus->self->ari_enabled; +}

Is this a generally useful predicate?

+ +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; + struct pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci manage_pci; + struct physdev_manage_pci_ext manage_pci_ext; + +#ifdef CONFIG_PCI_IOV + if (pci_dev->is_virtfn) { + memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); + manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn = pci_dev->devfn; + manage_pci_ext.is_virtfn = 1; + manage_pci_ext.physfn.bus = pci_dev->physfn->bus->number; + manage_pci_ext.physfn.devfn = pci_dev->physfn->devfn; + r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + &manage_pci_ext); + } else +#endif + if (pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)) { + memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); + manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn = pci_dev->devfn; + manage_pci_ext.is_extfn = 1; + r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + &manage_pci_ext); + } else { + manage_pci.bus = pci_dev->bus->number; + manage_pci.devfn = pci_dev->devfn; + r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + } + if (r && r != -ENOSYS) + return r;

Why is ENOSYS OK?  Doesn't it mean that Xen is missing some aspect of MSI 
support?

+ + r = pci_bus_probe(dev); + return r; +} + +static int pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci manage_pci; + manage_pci.bus = pci_dev->bus->number; + manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /* dev and pci_dev are no longer valid!! */ + + WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, + &manage_pci));
I prefer:
        if (... != 0)
                WARN_ON(1);
to make it obvious that we're doing an error check, rather than some
internal consistency assertion.

+ return r; +} + +static int __init hook_pci_bus(void) +{ + + if (!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; + pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove = pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper; + + return 0; +} + +core_initcall(hook_pci_bus);



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>