On Tue, Nov 16, 2010 at 10:57 AM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
>> >> Using the bounce buffers limits the DMA operations to under 32-bit. So
>> >> could it be that you are using some casting macro that casts a PFN to
>> >> unsigned long or vice-versa and we end up truncating it to 32-bit? (I've
>> >> seen this issue actually with InfiniBand drivers back in RHEL5 days..).
>> >> Lastly, do you set your DMA mask on the device to 32BIT?
>> >>
>> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to
>> >> set 32-bit physical addr to chip. Others need to set 45-bit physical addr
>> >> to chip.
>> >
>> > Oh boy. That complicates it.
>> >
>> >> The driver doesn't set DMA mask on the device to 32 bit.
>> >
>> > Is it set then to 45bit?
>> >
>>
>> We were not explicitly setting the DMA mask. pci_alloc_coherent was
>
> You should. But only once (during startup).
>
>> always returning 32 bits but pci_map_single was returning a 34-bit
>> address which we truncate by casting it to a uint32_t since the
>
> Truncating any bus (DMA) address is a big no no.
>
>> Tachyon's HBA register is only 32 bits. With swiotlb=force, both
>
> Not knowing the driver I can't comment here much, but
> 1). When you say 'HBA registers' I think PCI MMIO BARs. Those are
> usually found beneath the 4GB limit and you get the virtual
> address when doing ioremap (or the pci equivalant). And the
> bus address is definitly under the 4GB.
> 2). After you have done that, set your pci_dma_mask to 34-bit, and then
> 2). For all other operations where you can do 34-bit use the pci_map
> _single. The swiotlb buffer looks at the dma_mask (and if there
> is no set it assumes 32bit), and if it finds the physical address
> to be within the DMA mask it will gladly translate the physical
> to bus and nothing else. If however the physical address is way
> beyound the bus address it will give you the bounce buffer which
> you will later have to copy from (using pci_sync..). I've written
> a little blurp at the bottom of the email explaining this in more details.
>
> Or is the issue that when you write to your HBA register the DMA
> address, the HBA register can _only_ deal with 32-bit values (4bytes)?
The HBA register which is using the address returned by pci_map_single
is limited to a 32-bit value.
> In which case the PCI device seems to be limited to addressing only up to
> 4GB, right?
The HBA has some 32-bit registers and some that are 45-bit.
>
>> returned 32 bits without explicitly setting the DMA mask. Once we set
>> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
>> with iommu=soft (and no more swiotlb=force), we're still stuck with
>> the abysmal I/O performance (same as when we had swiotlb=force).
>
> Right, that is expected.
So with iommu=soft, all I/Os have to go through Xen-SWIOTLB which
explains why we're seeing the abysmal I/O performance, right?
Is it true then that with an HVM domU kernel and PCI passthrough, it
does not use Xen-SWIOTLB and therefore results in better performance?
>
>> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's
>> the default if we don't specify it? Without it, we get no I/Os (it
>
> If you don't specify it you can't do PCI passthrough in PV guests.
> It is automatically enabled when you boot Linux as Dom0.
>
>> seems the interrupts and/or DMA don't work).
>
> It has two purposes:
>
> 1). The predominant and which is used for both DomU and Dom0 is to
> translate physical address to machine frame numbers (PFNs->MFNs).
> Xen PV guests have a P2M array that is consulted when setting
> virtual addresses (PTEs). For PCI BARs, they are equivalant
> (PFN == MFN), but for memory regions they can be discontigous,
> and in decreasing order. If you would traverse the P2M list you
> could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119.
>
> So obviously we need a lookup mechanism to say find for
> virtual address 0xfffff8000010000 the DMA address (bus address).
> Naively on baremetal on X86 you could use virt_to_phy which would
> get you PFN 0x10000. On Xen however, we need to consult the P2M array.
> For example, for p2m[0x10000], the real machine frame number might
> 0x102323.
>
> So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the
> machine frame number and returns that (dma address aka bus address). That
> is the value you tell the HBA to transform from/to.
>
> If you don't enable Xen-SWIOTLB, and use the native one (or none at all),
> you end up programming the PCI driver with bogus data since the bus
> address you
> are giving the card does not correspond to the real bus address.
>
> 2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That
> MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory
> Write
> and PCI Memory Read b/c it only has 32-bit address bits we need some way
> of still getting the contents of 0x102323 to the PCI card. This is where
> bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB
> chunk of space that is underneath the 4GB space - it is also contingous.
> When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the
> MFN,
> and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one
> it'ss
> buffers, gives you the MFN of its buffer (say 0x20000) and you program
> that
> in the PCI card. When you get an interrupt from the PCI card, you call
> pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN
> 0x20000
> back on the list of buffers to be used. And now you have in MFN 0x102323
> the
> result.
>
>>
>> Are there any profiling tools you can suggest for domU? I was able to
>> apply Dulloor's xenoprofile patch to our dom0 kernel (2.6.32.25-pvops)
>> but not to xen-pcifront-0.8.2.
>
> Oh boy. I don't sorry.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|