> >> Using the bounce buffers limits the DMA operations to under 32-bit. So
> >> could it be that you are using some casting macro that casts a PFN to
> >> unsigned long or vice-versa and we end up truncating it to 32-bit? (I've
> >> seen this issue actually with InfiniBand drivers back in RHEL5 days..).
> >> Lastly, do you set your DMA mask on the device to 32BIT?
> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to
> >> set 32-bit physical addr to chip. Others need to set 45-bit physical addr
> >> to chip.
> > Oh boy. That complicates it.
> >> The driver doesn't set DMA mask on the device to 32 bit.
> > Is it set then to 45bit?
> We were not explicitly setting the DMA mask. pci_alloc_coherent was
You should. But only once (during startup).
> always returning 32 bits but pci_map_single was returning a 34-bit
> address which we truncate by casting it to a uint32_t since the
Truncating any bus (DMA) address is a big no no.
> Tachyon's HBA register is only 32 bits. With swiotlb=force, both
Not knowing the driver I can't comment here much, but
1). When you say 'HBA registers' I think PCI MMIO BARs. Those are
usually found beneath the 4GB limit and you get the virtual
address when doing ioremap (or the pci equivalant). And the
bus address is definitly under the 4GB.
2). After you have done that, set your pci_dma_mask to 34-bit, and then
2). For all other operations where you can do 34-bit use the pci_map
_single. The swiotlb buffer looks at the dma_mask (and if there
is no set it assumes 32bit), and if it finds the physical address
to be within the DMA mask it will gladly translate the physical
to bus and nothing else. If however the physical address is way
beyound the bus address it will give you the bounce buffer which
you will later have to copy from (using pci_sync..). I've written
a little blurp at the bottom of the email explaining this in more details.
Or is the issue that when you write to your HBA register the DMA
address, the HBA register can _only_ deal with 32-bit values (4bytes)?
In which case the PCI device seems to be limited to addressing only up to 4GB,
> returned 32 bits without explicitly setting the DMA mask. Once we set
> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
> with iommu=soft (and no more swiotlb=force), we're still stuck with
> the abysmal I/O performance (same as when we had swiotlb=force).
Right, that is expected.
> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's
> the default if we don't specify it? Without it, we get no I/Os (it
If you don't specify it you can't do PCI passthrough in PV guests.
It is automatically enabled when you boot Linux as Dom0.
> seems the interrupts and/or DMA don't work).
It has two purposes:
1). The predominant and which is used for both DomU and Dom0 is to
translate physical address to machine frame numbers (PFNs->MFNs).
Xen PV guests have a P2M array that is consulted when setting
virtual addresses (PTEs). For PCI BARs, they are equivalant
(PFN == MFN), but for memory regions they can be discontigous,
and in decreasing order. If you would traverse the P2M list you
could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119.
So obviously we need a lookup mechanism to say find for
virtual address 0xfffff8000010000 the DMA address (bus address).
Naively on baremetal on X86 you could use virt_to_phy which would
get you PFN 0x10000. On Xen however, we need to consult the P2M array.
For example, for p2m[0x10000], the real machine frame number might
So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the
machine frame number and returns that (dma address aka bus address). That
is the value you tell the HBA to transform from/to.
If you don't enable Xen-SWIOTLB, and use the native one (or none at all),
you end up programming the PCI driver with bogus data since the bus
are giving the card does not correspond to the real bus address.
2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That
MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory
and PCI Memory Read b/c it only has 32-bit address bits we need some way
of still getting the contents of 0x102323 to the PCI card. This is where
bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB
chunk of space that is underneath the 4GB space - it is also contingous.
When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the
and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one
buffers, gives you the MFN of its buffer (say 0x20000) and you program that
in the PCI card. When you get an interrupt from the PCI card, you call
pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN
back on the list of buffers to be used. And now you have in MFN 0x102323
> Are there any profiling tools you can suggest for domU? I was able to
> apply Dulloor's xenoprofile patch to our dom0 kernel (18.104.22.168-pvops)
> but not to xen-pcifront-0.8.2.
Oh boy. I don't sorry.
Xen-devel mailing list