[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI



On Mon, 2015-02-23 at 08:43 +0000, Jan Beulich wrote:
> >>> On 20.02.15 at 18:33, <ian.campbell@xxxxxxxxxx> wrote:
> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
> >> > That's the issue we are trying to resolve, with device tree there is no
> >> > explicit segment ID, so we have an essentially unindexed set of PCI
> >> > buses in both Xen and dom0.
> >> 
> >> How that? What if two bus numbers are equal? There ought to be
> >> some kind of topology information. Or if all buses are distinct, then
> >> you don't need a segment number.
> > 
> > It's very possible that I simply don't have the PCI terminology straight
> > in my head, leading to me talking nonsense.
> > 
> > I'll explain how I'm using it and perhaps you can put me straight...
> > 
> > My understanding was that a PCI segment equates to a PCI host
> > controller, i.e. a specific instance of some PCI host IP on an SoC.
> 
> No - there can be multiple roots (i.e. host bridges)

Where a "host bridge" == what I've been calling "PCI host controller"?

I suppose in principal a single controller might expose multiple host
bridges, but I think we can logically treat such things as being
multiple controllers (e.g. with multiple CFG spaces etc).

>  on a single
> segment. Segments are - afaict - purely a scalability extension
> allowing to overcome the 256 bus limit.

Is the converse true -- i.e. can a single host bridge span multiple
segments? IOW is the mapping from segment->host bridge many->one or
many->many?

Maybe what I should read into what you are saying is that segments are
purely a software and/or firmware concept with no real basis in the
hardware?

In which case might we be at liberty to specify that on ARM+Device Tree
systems (i.e. those where the f/w tables don't give an enumeration)
there is a 1:1 mapping from segments to host bridges?

> > A PCI host controller defines the root of a bus, within which the BDF
> > need not be distinct due to the differing segments which are effectively
> > a higher level namespace on the BDFs.
> 
> The host controller really defines the root of a tree (often covering
> multiple buses, i.e. as soon as bridges come into play).

Right, I think that's the one thing I'd managed to understanding
correctly ;-)

> > So given a system with two PCI host controllers we end up with two
> > segments (lets say A and B, but choosing those is the topic of this
> > thread) and it is acceptable for both to contain a bus 0 with a device 1
> > on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
> > 
> > It sounds like you are saying that this is not actually acceptable and
> > that 0:0.0 must be unique in the system irrespective of the associated
> > segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?
> 
> No, there can be multiple buses numbered zero. And at the same
> time a root bus doesn't need to be bus zero on its segment.

0:0.0 was just an example I pulled out of thin air, it wasn't supposed
to imply some special property of bus 0 e.g. being the root or anything
like that.

If there are multiple buses numbered 0 then are they distinguished via
segment or something else?

> > Just for reference a DT node describing a PCI host controller might look
> > like (taking the APM Mustang one as an example):
> > 
> >                 pcie0: pcie@1f2b0000 {
> >                         status = "disabled";
> >                         device_type = "pci";
> >                         compatible = "apm,xgene-storm-pcie", 
> > "apm,xgene-pcie";
> >                         #interrupt-cells = <1>;
> >                         #size-cells = <2>;
> >                         #address-cells = <3>;
> >                         reg = < 0x00 0x1f2b0000 0x0 0x00010000   /* 
> > Controller registers */
> >                                 0xe0 0xd0000000 0x0 0x00040000>; /* PCI 
> > config space */
> >                         reg-names = "csr", "cfg";
> >                         ranges = <0x01000000 0x00 0x00000000 0xe0 
> > 0x10000000 0x00 0x00010000   /* io */
> >                                   0x02000000 0x00 0x80000000 0xe1 
> > 0x80000000 0x00 0x80000000>; /* mem */
> >                         dma-ranges = <0x42000000 0x80 0x00000000 0x80 
> > 0x00000000 0x00 0x80000000
> >                                       0x42000000 0x00 0x00000000 0x00 
> > 0x00000000 0x80 0x00000000>;
> >                         interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> >                         interrupt-map = <0x0 0x0 0x0 0x1 &gic 0x0 0xc2 0x1
> >                                          0x0 0x0 0x0 0x2 &gic 0x0 0xc3 0x1
> >                                          0x0 0x0 0x0 0x3 &gic 0x0 0xc4 0x1
> >                                          0x0 0x0 0x0 0x4 &gic 0x0 0xc5 0x1>;
> >                         dma-coherent;
> >                         clocks = <&pcie0clk 0>;
> >                 };
> > 
> > I expect most of this is uninteresting but the key thing is that there
> > is no segment number nor topology relative to e.g. "pcie1:
> > pcie@1f2c0000" (the node look identical except e.g. all the base
> > addresses and interrupt numbers differ).
> 
> What I don't get from this is where the BDF is being represented.

It isn't, since this is representing the host controller not any given
PCI devices which it contains.

I thought in general BDFs were probed (or even configured) by the
firmware and/or OS by walking over the CFG space and so aren't
necessarily described anywhere in the firmware tables.

FWIW the first 4 bytes in each line of interrupt-map are actually
somehow matched against the masked (via interrupt-map-mask) against an
encoding of the BDF to give the INTx routing, but BDFs aren't
represented in the sense I think you meant in the example above.

There is a capability to have child nodes of this root controller node
which describe individual devices, and there is an encoding for the BDF
in there, but these are not required. For reference I've pasted a DT
snippet from a Nvidia Jetson TK1 (Tegra124) system which has child
nodes. I think the BDF is encoded in assigned-addresses somewhere.

> Yet that arrangement is fundamental to understand whether you
> really need segments to properly disambiguate devices.

Have I clarified enough? I've a feeling not...

Ian.

        pcie-controller@0,01003000 {
                compatible = "nvidia,tegra124-pcie";
                device_type = "pci";
                reg = <0x0 0x01003000 0x0 0x00000800   /* PADS registers */
                       0x0 0x01003800 0x0 0x00000800   /* AFI registers */
                       0x0 0x02000000 0x0 0x10000000>; /* configuration space */
                reg-names = "pads", "afi", "cs";
                interrupts = <GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>, /* controller 
interrupt */
                             <GIC_SPI 99 IRQ_TYPE_LEVEL_HIGH>; /* MSI interrupt 
*/
                interrupt-names = "intr", "msi";

                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &gic GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>;

                bus-range = <0x00 0xff>;
                #address-cells = <3>;
                #size-cells = <2>;

                ranges = <0x82000000 0 0x01000000 0x0 0x01000000 0 0x00001000   
/* port 0 configuration space */
                          0x82000000 0 0x01001000 0x0 0x01001000 0 0x00001000   
/* port 1 configuration space */
                          0x81000000 0 0x0        0x0 0x12000000 0 0x00010000   
/* downstream I/O (64 KiB) */
                          0x82000000 0 0x13000000 0x0 0x13000000 0 0x0d000000   
/* non-prefetchable memory (208 MiB) */
                          0xc2000000 0 0x20000000 0x0 0x20000000 0 0x20000000>; 
/* prefetchable memory (512 MiB) */

                clocks = <&tegra_car TEGRA124_CLK_PCIE>,
                         <&tegra_car TEGRA124_CLK_AFI>,
                         <&tegra_car TEGRA124_CLK_PLL_E>,
                         <&tegra_car TEGRA124_CLK_CML0>;
                clock-names = "pex", "afi", "pll_e", "cml";
                resets = <&tegra_car 70>,
                         <&tegra_car 72>,
                         <&tegra_car 74>;
                reset-names = "pex", "afi", "pcie_x";
                status = "disabled";

                phys = <&padctl TEGRA_XUSB_PADCTL_PCIE>;
                phy-names = "pcie";

                pci@1,0 {
                        device_type = "pci";
                        assigned-addresses = <0x82000800 0 0x01000000 0 0x1000>;
                        reg = <0x000800 0 0 0 0>;
                        status = "disabled";

                        #address-cells = <3>;
                        #size-cells = <2>;
                        ranges;

                        nvidia,num-lanes = <2>;
                };

                pci@2,0 {
                        device_type = "pci";
                        assigned-addresses = <0x82001000 0 0x01001000 0 0x1000>;
                        reg = <0x001000 0 0 0 0>;
                        status = "disabled";

                        #address-cells = <3>;
                        #size-cells = <2>;
                        ranges;

                        nvidia,num-lanes = <1>;
                };
        };




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.