This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU

To: Dante Cinco <dantecinco@xxxxxxxxx>
Subject: Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Tue, 16 Nov 2010 13:57:49 -0500
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 16 Nov 2010 11:01:46 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTi=H6r2=-zJE+6eCtP4VXacYhd_e47+KRW5vdwjS@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20101112165541.GA10339@xxxxxxxxxxxx> <EB4C61A1A2501842A04B573FE42B14D601374FBFD2@xxxxxxxxxxxxxxxxx> <20101112223333.GD26189@xxxxxxxxxxxx> <AANLkTi=H6r2=-zJE+6eCtP4VXacYhd_e47+KRW5vdwjS@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
> >> Using the bounce buffers limits the DMA operations to under 32-bit. So 
> >> could it be that you are using some casting macro that casts a PFN to 
> >> unsigned long or vice-versa and we end up truncating it to 32-bit? (I've 
> >> seen this issue actually with InfiniBand drivers back in RHEL5 days..). 
> >> Lastly, do you set your DMA mask on the device to 32BIT?
> >>
> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to 
> >> set 32-bit physical addr to chip. Others need to set 45-bit physical addr 
> >> to chip.
> >
> > Oh boy. That complicates it.
> >
> >> The driver doesn't set DMA mask on the device to 32 bit.
> >
> > Is it set then to 45bit?
> >
> We were not explicitly setting the DMA mask. pci_alloc_coherent was

You should. But only once (during startup).

> always returning 32 bits but pci_map_single was returning a 34-bit
> address which we truncate by casting it to a uint32_t since the

Truncating any bus (DMA) address is a big no no.

> Tachyon's HBA register is only 32 bits. With swiotlb=force, both

Not knowing the driver I can't comment here much, but
 1). When you say 'HBA registers' I think PCI MMIO BARs. Those are
     usually found beneath the 4GB limit and you get the virtual
     address when doing ioremap (or the pci equivalant). And the
     bus address is definitly under the 4GB.
 2). After you have done that, set your pci_dma_mask to 34-bit, and then
 2). For all other operations where you can do 34-bit use the pci_map
     _single. The swiotlb buffer looks at the dma_mask (and if there
     is no set it assumes 32bit), and if it finds the physical address
     to be within the DMA mask it will gladly translate the physical
     to bus and nothing else. If however the physical address is way
     beyound the bus address it will give you the bounce buffer which
     you will later have to copy from (using pci_sync..). I've written
     a little blurp at the bottom of the email explaining this in more details.

Or is the issue that when you write to your HBA register the DMA
address, the HBA register can _only_ deal with 32-bit values (4bytes)?
In which case the PCI device seems to be limited to addressing only up to 4GB, 

> returned 32 bits without explicitly setting the DMA mask. Once we set
> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
> with iommu=soft (and no more swiotlb=force), we're still stuck with
> the abysmal I/O performance (same as when we had swiotlb=force).

Right, that is expected.

> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's
> the default if we don't specify it? Without it, we get no I/Os (it

If you don't specify it you can't do PCI passthrough in PV guests.
It is automatically enabled when you boot Linux as Dom0.

> seems the interrupts and/or DMA don't work).

It has two purposes:

 1). The predominant and which is used for both DomU and Dom0 is to
     translate physical address to machine frame numbers (PFNs->MFNs).
     Xen PV guests have a P2M array that is consulted when setting
     virtual addresses (PTEs). For PCI BARs, they are equivalant
     (PFN == MFN), but for memory regions they can be discontigous,
     and in decreasing order. If you would traverse the P2M list you
     could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119.

     So obviously we need a lookup mechanism to say find for
     virtual address 0xfffff8000010000 the DMA address (bus address).
     Naively on baremetal on X86 you could use virt_to_phy which would
     get you PFN 0x10000. On Xen however, we need to consult the P2M array.
     For example, for p2m[0x10000], the real machine frame number might 

     So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the
     machine frame number and returns that (dma address aka bus address). That
     is the value you tell the HBA to transform from/to.

     If you don't enable Xen-SWIOTLB, and use the native one (or none at all),
     you end up programming the PCI driver with bogus data since the bus 
address you
     are giving the card does not correspond to the real bus address.

 2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That
     MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory 
     and PCI Memory Read b/c it only has 32-bit address bits we need some way
     of still getting the contents of 0x102323 to the PCI card. This is where
     bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB
     chunk of space that is underneath the 4GB space - it is also contingous.
     When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the 
     and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one 
     buffers, gives you the MFN of its buffer (say 0x20000) and you program that
     in the PCI card.  When you get an interrupt from the PCI card, you call
     pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN 
     back on the list of buffers to be used. And now you have in MFN 0x102323 
> Are there any profiling tools you can suggest for domU? I was able to
> apply Dulloor's xenoprofile patch to our dom0 kernel (
> but not to xen-pcifront-0.8.2.

Oh boy. I don't sorry.

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>