WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH]Add MSI support to PV_dom0

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: RE: [Xen-devel] [PATCH]Add MSI support to PV_dom0
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Wed, 13 May 2009 11:16:26 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Matthew Wilcox <matthew@xxxxxx>
Delivery-date: Tue, 12 May 2009 20:19:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A09D69F.4010109@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098401D1F443F2@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4A09D69F.4010109@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcnTPRQIqrf98XJwTjuqRf35dvC6ZwAMC5Mw
Thread-topic: [Xen-devel] [PATCH]Add MSI support to PV_dom0
Jeremy, thanks for your detailed comments, I will update according to your 
feedback. See comments embeded please.

--jyh

Jeremy Fitzhardinge wrote:
> Jiang, Yunhong wrote:
>> Jeremy, attached is basic MSI support to PV_dom0.  Please have a
>> look on it.
>>
>> Thanks
>> Yunhong Jiang
>>
>> The pci_wrapper.patch add some hook to pci_bus_type, so that
> Xen will be notified when a PCI device is added to system.
>>  Makefile   |    2 -
>>  pci_wrap.c |   90
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 91 insertions(+), 1 deletion(-)
>>
>> The enable_msi.patch add the MSI support to dom0, basically
> it just allocate irq from Xen irq name space.
>>
>>  arch/x86/include/asm/xen/pci.h  |   17 +++++++
>>  arch/x86/kernel/apic/io_apic.c  |   22 +++++++--
>>  arch/x86/xen/apic.c             |    1
>>  drivers/xen/events.c            |   91
> +++++++++++++++++++++++++++++++++++++++-
>>  include/xen/interface/physdev.h |   59 +++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+), 8 deletions(-)
>>
>> The reenable_msi.patch just remove original function that
> disable MSI in xen environment.
>>
>>  arch/x86/xen/apic.c |    2 --
>>  drivers/pci/pci.h   |    2 ++
>>  include/linux/pci.h |    6 ------
>>  3 files changed, 2 insertions(+), 8 deletions(-)
>>
>>
>> Notice:
>> a) Currently the PCI save/restore is broken. The reason is
> because pci_restore_msi_state() in "drivers/pci/msi.c" will
> try to restore the MSI config depends on device's msi msg
> information (i.e. content of msi_desc->msg). However, that
> information is wrong in Xen environment because Xen HV owns
> MSI. I'm still trying to find a method to achieve the
> save/restore without touch the common msi.c code, any
> suggestion is welcome.
>>
>
> You mean because it tries to directly write to pci config
> space, or that
> it writes the wrong thing there?

Because It will try to write to the pci config space directly, and currently it 
will write a wrong value.
I'm considering to overwrite msi_compose_msg() in 
arch/x86/kernel/apic/io_apic.c, so that msi_desc->msg will hold the value Xen 
is using, I hope it will resolve the issue.


>
> What will this affect?  Dom0 S3 suspend/resume?  More?

Yes, suspend/resume.

>
>> b) I notice pci frontend is in a branch. But to support pci
> frontend without touch common PCI function is difficult, I'm
> still considering it.
>>
>
> OK.
>
>> c) I'm not sure if xen's irq space should be same as pirq.
> Basically I think irq is dom0 internal structure, while PIRQ
> is interface between Xen HV/dom0. But seems current
> implementation think these two items are same. I didn't try to
> change it, but that may need improvement.
>
> What do you mean by "Xen's irq space"?  Does it have an irq
> space that's
> distinct from pirqs?  Or do you mean "Linux's irq space"?  At the

Aha, yes, I mean linux's irq space.

> moment, a Linux irq == the gsi, which is a convention I kept for Xen

I suspect if this will always work, because
1) It may not work on x86_32, because gsi is compressed in x86_32, or we will 
not support that?
2) Even in x86_64, I suspect it may not work still. Currently Xen defined 
NR_IRQS as 256, unless specified in Rules.mk. So what will happen if the gsi > 
256? I suspect the PHYSDEVOP_alloc_irq_vector hypercall will fail.

The reason is, the pirq is virtual interrupt line between Xen/guest, and is 
determined by Xen HV, while irq/gsi is defined by guest/dom0. Current Xen's 
dom0 has a mapping between pirq/irq, so I'm not sure if that logic is needed in 
pv_dom0 still.

> interrupts, though there's no very strong tie (identity_mapped_irq()
> determines where we bother with the identity mapping; only the legacy
> ISA interrupts are essential).  Obviously this has no meaning with
> MSI interrupts.
>
> Or have I misunderstood you?
>
> (It would be easier to review the patches if they were inline,
> one per mail)

Sorry, I will do it next time.

>
>> commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang,
>> Yunhong <yunhong.jiang@xxxxxxxxx> Date: Fri May 8 00:50:34 2009
>> +0800 Add MSI support to Xen Dom0 It add some hook to arch specific
>> MSI function, so that it will a) allocate irq from Xen's irq space
>> b) allocate vector through Xen's map_irq hypercall. Currently Xen's
>> irq space function has no chip_data function, I assume this should
>> be ok since we Xen's IRQ is different chip with native IOAPIC.
>> Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> diff --git
>> a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
>> index 0563fc6..a5d9027 100644 ---
> a/arch/x86/include/asm/xen/pci.h +++
>> b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef
>> CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int
>> polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned
>> int want, + struct msi_desc *msidesc, + int type); +int
>> xen_destroy_irq(int irq); #else static inline int
>> xen_register_gsi(u32 gsi, int triggering, int polarity) { return -1;
>> } + +int xen_create_msi_irq(struct pci_dev *dev,
> static inline
>> + unsigned int want, + struct msi_desc *msidesc, + int type) +{ +
>> return -1; +} +int xen_destroy_irq(int irq) static inline
>> +{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git
>> a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
>> index b562550..d1e3408 100644 ---
> a/arch/x86/kernel/apic/io_apic.c +++
>> b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include
>> <asm/xen/hypervisor.h> #include <asm/apic.h> +#include
>> <asm/xen/pci.h> #define __apicdebuginit(type) static type __init @@
>> -3472,6 +3473,10 @@ static int setup_msi_irq(struct pci_dev *dev,
>> struct msi_desc *msidesc, int irq) return ret; set_irq_msi(irq,
>> msidesc); + + if (xen_domain()) + return 0;
>
> I think it would be cleaner to have a xen_setup_msi_irq()
> which just does the
>       ret = msi_compose_msg(dev, irq, &msg);
>       if (ret < 0)
>               return ret;
>
>       set_irq_msi(irq, msidesc);
>
> parts then returns, and put the if (xen) logic at the callsite
> (either with an explicit test, or add an ops vec of some kind.

Sure, I will do like that. In fact, my draft implemented this way.
BTW, what do you mean of "an ops vec"???

>
>> + write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9
>> +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec,
>> int type) irq_want = nr_irqs_gsi; sub_handle = 0;
>> list_for_each_entry(msidesc, &dev->msi_list, list) { - irq =
>> create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq =
>> xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { +
>> dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else
>> { + irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + }
> Hoist this out into a create_msi_irq() which does the native
> vs Xen thing
> appropriately.

Sure.

>> irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@
>> -3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if
>> (xen_destroy_irq(irq)) + destroy_irq(irq);
> Hoist this out too.  (destroy_msi_irq()?)

Sure.

>> return ret; }
>
> Do we (or will we ever) support the interrupt remapping stuff
> in the dom0 kernel,
> or is that something that might happen within Xen?  (I don't
> know anything about
> it.)

Per my understanding, interrupt remapping should happen within Xen.

>
> If not, then it looks like there isn't very much common code
> between the native
> and Xen versions of arch_setup_msi_irqs().  I think it might
> be overall cleaner
> just to explicitly have two versions, with
> arch_setup_msi_irqs() being a wrapper to
> choose which one to use.  Even if we do support interrupt
> remapping, it wouldn't
> result in much duplicated code at all, as far as I can see.

So do you mean something like
arch_setup_msi_irqs(...)
{
        if (xen_domain)
                xen_setup_msi_irqs()
          ..... same as native
}

>
>> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); +
>> if (xen_destroy_irq(irq)) + destroy_irq(irq);
>
> Yes, this little pattern should be put into a single place.

What do you mean of put into a single place?

>
>> } #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git
>> a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d
>> 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7
>> +1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include
>> <linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h>
>> #include <asm/acpi.h> diff --git a/drivers/xen/events.c
>> b/drivers/xen/events.c index af2aad4..65e4c7a 100644 ---
>> a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@
>> #include <linux/string.h> #include <linux/bootmem.h> #include
>> <linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h>
>> +#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h>
>> @@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name)
>> if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
>> dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + +
>> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: +
>> spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int
>> xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; +
>> int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq;
>> + unmap_irq.domid = DOMID_SELF; + if ((rc =
>> HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { +
>> printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } -
>> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq]
>> = mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: -
>> spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int
>> xen_create_msi_irq(struct pci_dev *dev, + unsigned int want,
>
> Do we need this param?  Looks unused.

I wil change it. I just wanted to keep it similar as native create_irq_nr() 
function.

>
>> + struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct
>> physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; +
>> int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; +
>> + memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; +
>> map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus
>> = dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type ==
>> PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev,
>> PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base)
> (base + 0x04)
> Isn't this defined somewhere else?  If not, is there a better
> place to define it?

It is defined in drivers/pci/msi.h. As I don't want to touch anything in 
drivers/pci/ directory, so I put it here (sorry that I should place it under 
some .h file).

>> + pci_read_config_dword(dev, msix_table_offset_reg(pos),
>> &table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
>> + + map_irq.table_base = pci_resource_start(dev, bir); +
>> map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + +
>> spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); +
>> + if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc =
>> HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { +
>> printk(KERN_WARNING "xen map irq failed %x\n", rc); +
>> dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + +
>> irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); +
>> set_irq_chip_and_handler_name(irq, &xen_pirq_chip, +
>> handle_level_irq, + (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); +
>> +out: + spin_unlock(&irq_mapping_update_lock); return irq; } diff
>> --git a/include/xen/interface/physdev.h
>> b/include/xen/interface/physdev.h index cd69391..47ab7f7 100644 ---
>> a/include/xen/interface/physdev.h +++
>> b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct
>> physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define
>> MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define
>> PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; +
>> /* IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int
>> pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int
>> entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define
>> PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t
>> domid; + /* IN */ + int pirq; +}; + +#define
>> PHYSDEVOP_manage_pci_add 15 +#define PHYSDEVOP_manage_pci_remove 16
>> +struct physdev_manage_pci { + /* IN */ + uint8_t bus; + uint8_t
>> devfn; +}; + +#define PHYSDEVOP_restore_msi 19 +struct
>> physdev_restore_msi { + /* IN */ + uint8_t bus; + uint8_t devfn; +};
>> + +#define PHYSDEVOP_manage_pci_add_ext 20 +struct
>> physdev_manage_pci_ext { + /* IN */ + uint8_t bus; + uint8_t devfn;
>> + unsigned is_extfn; + unsigned is_virtfn; + struct { + uint8_t bus;
>> + uint8_t devfn; + } physfn; +}; + /* * Notify that some PIRQ-bound
>> event channels have been unmasked. * ** This command is obsolete
>> since interface version 0x00030202 and is **
>
>
>> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang,
>> Yunhong <yunhong.jiang@xxxxxxxxx> Date: Thu May 7 21:22:26 2009
>> +0800 Add the pci wrap function, to notify Xen of all PCI
>> information. Currently Xen depends on Dom0 to notify all PCI
>> information. When Xen try to setup MSI for a pci device, it will
>> depends on this information. This method need more discussion, since
>> it add some ugly hook to pci_bus_type.
>
> Yes, this is a bit unpleasant.  I can't see any immediately
> satisfying solution, partly because
> I don't fully understand what needs to be done in these hooks.
> Why does it need to do this at probe
> time rather than when setting up the interrupts?

The main purpose of this hook is to keep Xen have the list of PCI device in the 
system, include those hot plug/unpluged devices, before the device begin 
function,  I think the main purpose is for VT-d support in Xen, so it is in 
probe time.

In fact, MSI should works without this list. Unfortunately, currently Xen will 
check if the device exists or not when enabling MSI/MSI-X, and will return 
-ENODEV if the device is not in the list.

But I do think this code is very ugly, and I suspect if we can push this code 
into upstream successfully. I know Winston is working on PCI hotplug in Xen 
side, I will talk with him to see if any plan to change this logic.

>
>> Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> diff --git
>> a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index
>> f0d1a89..372ad9e 100644 --- a/arch/x86/xen/Makefile +++
>> b/arch/x86/xen/Makefile @@ -13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o
>> spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
>> obj-$(CONFIG_XEN_DOM0) += vga.o apic.o obj-$(CONFIG_PCI_XEN) +=
>> pci-swiotlb.o -obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at
>> end of file +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff
>> --git a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file
>> mode 100644 index 0000000..4ab6966 --- /dev/null +++
>> b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + *
>> vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h>
>> +#include <linux/init.h> +#include <linux/pci.h> +#include
>> <xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include
>> <asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device
>> *dev); +static int (*pci_bus_remove)(struct device *dev);
>
> I'd give these more distinctive names: orig_pci_bus_probe, or
> something, and call them with (*orig_pci_bus_probe)(...) to make it
> obvious they're
> function callers.  I overlooked the calls the first couple of
> times because
> they didn't stand out.
>
>> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return
>> bus->self && bus->self->ari_enabled; +}
>
> Is this a generally useful predicate?

I just copied from Xen's dom0 code. I will check it .

>
>> + +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; +
>> struct pci_dev *pci_dev = to_pci_dev(dev); + struct
>> physdev_manage_pci manage_pci; + struct physdev_manage_pci_ext
>> manage_pci_ext; + +#ifdef CONFIG_PCI_IOV + if (pci_dev->is_virtfn) {
>> + memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); +
>> manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn =
>> pci_dev->devfn; + manage_pci_ext.is_virtfn = 1; +
>> manage_pci_ext.physfn.bus = pci_dev->physfn->bus->number; +
>> manage_pci_ext.physfn.devfn = pci_dev->physfn->devfn; + r =
>> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, +
>> &manage_pci_ext); + } else +#endif + if
>> (pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)) { +
>> memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); +
>> manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn =
>> pci_dev->devfn; + manage_pci_ext.is_extfn = 1; + r =
>> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, +
>> &manage_pci_ext); + } else { + manage_pci.bus =
>> pci_dev->bus->number; + manage_pci.devfn = pci_dev->devfn; + r =
>> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + }
>> + if (r && r != -ENOSYS) + return r;
>
> Why is ENOSYS OK?  Doesn't it mean that Xen is missing some
> aspect of MSI support?

I'm not sure if that is for backward compatibility? I just copied this from 
current Xen's dom0 code.

>
>> + + r = pci_bus_probe(dev); + return r; +} + +static int
>> pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct
>> pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci
>> manage_pci; + manage_pci.bus = pci_dev->bus->number; +
>> manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /*
>> dev and pci_dev are no longer valid!! */ + +
>> WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, +
>> &manage_pci));
> I prefer:
>       if (... != 0)
>               WARN_ON(1);
> to make it obvious that we're doing an error check, rather than some
> internal consistency assertion.
>
>> + return r; +} + +static int __init hook_pci_bus(void) +{ + + if
>> (!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; +
>> pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove =
>> pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper;
>> + + return 0; +} + +core_initcall(hook_pci_bus);

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>