[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough


  • To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • From: Dante Cinco <dantecinco@xxxxxxxxx>
  • Date: Tue, 16 Nov 2010 11:43:11 -0800
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 16 Nov 2010 11:44:01 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=G9fPWGnJ+dHWH50Rf+/mcRMFd0fOZopkUcs1dOUY/pr8auQMdV5m7qAR61x7p+qWdz bwMK/nTMw3XWEnLT1eVPRGkWxTcV/CrpvP2js2Vq/32vkNxNldA4rV0wrp38oLJ0TcUE N6XsLi9bWVbsqqiES2jqmQTSUZYpqUbtmnW3c=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Tue, Nov 16, 2010 at 10:57 AM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
>> >> Using the bounce buffers limits the DMA operations to under 32-bit. So 
>> >> could it be that you are using some casting macro that casts a PFN to 
>> >> unsigned long or vice-versa and we end up truncating it to 32-bit? (I've 
>> >> seen this issue actually with InfiniBand drivers back in RHEL5 days..). 
>> >> Lastly, do you set your DMA mask on the device to 32BIT?
>> >>
>> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to 
>> >> set 32-bit physical addr to chip. Others need to set 45-bit physical addr 
>> >> to chip.
>> >
>> > Oh boy. That complicates it.
>> >
>> >> The driver doesn't set DMA mask on the device to 32 bit.
>> >
>> > Is it set then to 45bit?
>> >
>>
>> We were not explicitly setting the DMA mask. pci_alloc_coherent was
>
> You should. But only once (during startup).
>
>> always returning 32 bits but pci_map_single was returning a 34-bit
>> address which we truncate by casting it to a uint32_t since the
>
> Truncating any bus (DMA) address is a big no no.
>
>> Tachyon's HBA register is only 32 bits. With swiotlb=force, both
>
> Not knowing the driver I can't comment here much, but
>  1). When you say 'HBA registers' I think PCI MMIO BARs. Those are
>     usually found beneath the 4GB limit and you get the virtual
>     address when doing ioremap (or the pci equivalant). And the
>     bus address is definitly under the 4GB.
>  2). After you have done that, set your pci_dma_mask to 34-bit, and then
>  2). For all other operations where you can do 34-bit use the pci_map
>     _single. The swiotlb buffer looks at the dma_mask (and if there
>     is no set it assumes 32bit), and if it finds the physical address
>     to be within the DMA mask it will gladly translate the physical
>     to bus and nothing else. If however the physical address is way
>     beyound the bus address it will give you the bounce buffer which
>     you will later have to copy from (using pci_sync..). I've written
>     a little blurp at the bottom of the email explaining this in more details.
>
> Or is the issue that when you write to your HBA register the DMA
> address, the HBA register can _only_ deal with 32-bit values (4bytes)?

The HBA register which is using the address returned by pci_map_single
is limited to a 32-bit value.

> In which case the PCI device seems to be limited to addressing only up to 
> 4GB, right?

The HBA has some 32-bit registers and some that are 45-bit.

>
>> returned 32 bits without explicitly setting the DMA mask. Once we set
>> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
>> with iommu=soft (and no more swiotlb=force), we're still stuck with
>> the abysmal I/O performance (same as when we had swiotlb=force).
>
> Right, that is expected.

So with iommu=soft, all I/Os have to go through Xen-SWIOTLB which
explains why we're seeing the abysmal I/O performance, right?

Is it true then that with an HVM domU kernel and PCI passthrough, it
does not use Xen-SWIOTLB and therefore results in better performance?

>
>> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's
>> the default if we don't specify it? Without it, we get no I/Os (it
>
> If you don't specify it you can't do PCI passthrough in PV guests.
> It is automatically enabled when you boot Linux as Dom0.
>
>> seems the interrupts and/or DMA don't work).
>
> It has two purposes:
>
>  1). The predominant and which is used for both DomU and Dom0 is to
>     translate physical address to machine frame numbers (PFNs->MFNs).
>     Xen PV guests have a P2M array that is consulted when setting
>     virtual addresses (PTEs). For PCI BARs, they are equivalant
>     (PFN == MFN), but for memory regions they can be discontigous,
>     and in decreasing order. If you would traverse the P2M list you
>     could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119.
>
>     So obviously we need a lookup mechanism to say find for
>     virtual address 0xfffff8000010000 the DMA address (bus address).
>     Naively on baremetal on X86 you could use virt_to_phy which would
>     get you PFN 0x10000. On Xen however, we need to consult the P2M array.
>     For example, for p2m[0x10000], the real machine frame number might 
> 0x102323.
>
>     So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the
>     machine frame number and returns that (dma address aka bus address). That
>     is the value you tell the HBA to transform from/to.
>
>     If you don't enable Xen-SWIOTLB, and use the native one (or none at all),
>     you end up programming the PCI driver with bogus data since the bus 
> address you
>     are giving the card does not correspond to the real bus address.
>
>  2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That
>     MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory 
> Write
>     and PCI Memory Read b/c it only has 32-bit address bits we need some way
>     of still getting the contents of 0x102323 to the PCI card. This is where
>     bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB
>     chunk of space that is underneath the 4GB space - it is also contingous.
>     When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the 
> MFN,
>     and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one 
> it'ss
>     buffers, gives you the MFN of its buffer (say 0x20000) and you program 
> that
>     in the PCI card.  When you get an interrupt from the PCI card, you call
>     pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN 
> 0x20000
>     back on the list of buffers to be used. And now you have in MFN 0x102323 
> the
>     result.
>
>>
>> Are there any profiling tools you can suggest for domU? I was able to
>> apply Dulloor's xenoprofile patch to our dom0 kernel (2.6.32.25-pvops)
>> but not to xen-pcifront-0.8.2.
>
> Oh boy. I don't sorry.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.