[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i))



On 12/11/2013 06:32 PM, Konrad Rzeszutek Wilk wrote:
On Thu, Sep 12, 2013 at 06:20:18AM +0000, Zhang, Yang Z wrote:
Jan Beulich wrote on 2013-09-11:
On 11.09.13 at 15:26, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
On Wed, 11 Sep 2013 14:22:51 +0100, "Jan Beulich"
<JBeulich@xxxxxxxx>
  wrote:
On 11.09.13 at 15:10, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
On Wed, 11 Sep 2013 14:03:14 +0100, "Jan Beulich"
<JBeulich@xxxxxxxx>
  wrote:
On 11.09.13 at 14:45, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
  dmesg, xl dmesg, lspci -vvvnn and lspci -tvnn output is attached.

  I'll try adding one of my LSI cards and see the comparative
behaviour. Right now I don't even know if the phantom device  is
on the SAS card or the motherboard.

The Adaptec card being the only thing on bus 0f makes it pretty
likely that this other device also is on that card.

I guess the issue is mainly because the device itself is a PCI
one, while the immediately upstream bridge (where I mean only the
visible one) is PCIe. There _must_ be a PCIe-PCI bridge between
them. And as long as firmware doesn't know about that bridge and
the bridge doesn't properly handle config space accesses to it,
such a device just can't be used with an IOMMU (without some yet
to be invented workaround).

  I'm actually thinking about Konrad's proposed hack in that
thread from 3 years ago. If the device IDs are parameterized  out
rather than hard-coded, then this could work in nearly the  same
was as xen-pciback in terms of usage. Pass the phantom  device IDs
as parameters to the module. Done that way it  might even be
considered clean enough to be fit for public  consumption.

Except that, short of being able to determine it via config space
reads, we also need the resulting command line option to tell us
that what kind of device that is.

  Not sure I follow. Why do we need to know the device type?

Just look at set_msi_source_id() as well as
domain_context_{mapping,unmap}() (just the most prominent
examples): Behavior here heavily depends on the type of the device
itself _and_ that of the upstream bridge(s).
Looks like there are many devices are failed to work. I wonder whether the 
PCI/PCIe specification tells how to detect the hidden device behind those 
devices (Like detection of phantom device). If not, I think those devices are 
buggy. Or we can say those devices are not really PCI/PCIe compatible. Since 
VT-d only covers the PCI/PCIe device, it's reasonable that non-PCI/PCIe device 
failed to work under VT-d.

As Jan's suggestion, we need the user to tell us whether there is a hidden 
device or BDF behind anther device that the OS is unaware. We need to pass that 
info to Xen before pass-thought the device.


Interestingly enough I just hit this with my brand-new Haswell CPU and
new motherboard when passing in a capture card. It shows:

     +-1c.5-[07-09]----00.0-[08-09]--+-01.0-[09]--+-08.0  Brooktree Corporation 
Bt878 Video Capture
            |                               |            +-08.1  Brooktree 
Corporation Bt878 Audio Capture
            |                               |            +-09.0  Brooktree 
Corporation Bt878 Video Capture
            |                               |            +-09.1  Brooktree 
Corporation Bt878 Audio Capture
            |                               |            +-0a.0  Brooktree 
Corporation Bt878 Video Capture
            |                               |            +-0a.1  Brooktree 
Corporation Bt878 Audio Capture
            |                               |            +-0b.0  Brooktree 
Corporation Bt878 Video Capture
            |                               |            \-0b.1  Brooktree 
Corporation Bt878 Audio Capture
            |                               \-03.0  Texas Instruments 
TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]

And Xen says:
(XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] fault 
addr 36aa3000, iommu reg = ffff82c3ffd53000
(XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
(XEN) print_vtd_entries: iommu ffff83083d4939b0 dev 0000:08:00.0 gmfn 36aa3
(XEN)     root_entry = ffff83083d47e000
(XEN)     root_entry[8] = 72569a001
(XEN)     context = ffff83072569a000
(XEN)     context[0] = 0_0
(XEN)     ctxt_entry[0] not present
(XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] fault 
addr 36aa3000, iommu reg = ffff82c3ffd53000


Oddly enough it was working fine in a box with an AMD IOMMU. But
to be fair - that machine was running with Xen 4.1.

The hack I developed: 
http://lists.xen.org/archives/html/xen-devel/2010-06/msg00093.html
ends up with this:

(XEN) alloc_pdev: unknown type: 0000:08:00.0
(XEN) [VT-D]iommu.c:1484: d0:unknown(0): 0000:08:00.0
(XEN) [VT-D]iommu.c:1888: d0: context mapping failed

(FYI, this Xen 4.3.1)

Let me retry on the AMD box with the same version of Xen.

I may be wrong, but this doesn't look like the same problem (phantom PCI device on the bus). Or am I missing something?

As far as I can tell, the original problem was arising on cards that are PCIe, but based on a PCIX chipset, i.e. with a PCIe-PCIX bridge. Xen wasn't the only thing affected in my case - bare metal Linux kernel was also having problems with intel-iommu=1 in the kernel boot parameters. If might be worth trying that with your card to see what happens. If bare metal Linux with intel-iommu=1 works for your card, it's probably not the same problem (of course it could be similar/related).

Out of interest, I noticed recently there is a xen parameter "pci-phantom", but I haven't been able to find documentation for it. Can you point me in the right direction? Does it, perchance, allow specifying the PCI slot ID of a phantom device so that IOMMU doesn't freak out when a seemingly non-existant device starts trying to do DMA?

Gordan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.