This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough
From: Dante Cinco <dantecinco@xxxxxxxxx>
Date: Tue, 16 Nov 2010 11:43:11 -0800
Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 16 Nov 2010 11:44:01 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=h+TsFYb3z43bL6/YQXt4JVwAjcVHPD94K+Hhfl+gWMM=; b=KtvcOIN7jOg8flIpyoEWhcyfJ5aT17W3Sgar2/JsmXFWNKZSFTLQQgpVSk5qzI1A4W eW3jqD7AU8P+XB0F20SFEsSpiIh1Ufr2gLNdTsr8YGmaYP+F1NAhFI3Tmzzs5Cs2ifzb MK+ieJW5FJdu18iWpidOolEivsF9XwfneZHBI=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=G9fPWGnJ+dHWH50Rf+/mcRMFd0fOZopkUcs1dOUY/pr8auQMdV5m7qAR61x7p+qWdz bwMK/nTMw3XWEnLT1eVPRGkWxTcV/CrpvP2js2Vq/32vkNxNldA4rV0wrp38oLJ0TcUE N6XsLi9bWVbsqqiES2jqmQTSUZYpqUbtmnW3c=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20101116185748.GA11549@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20101112165541.GA10339@xxxxxxxxxxxx> <EB4C61A1A2501842A04B573FE42B14D601374FBFD2@xxxxxxxxxxxxxxxxx> <20101112223333.GD26189@xxxxxxxxxxxx> <AANLkTi=H6r2=-zJE+6eCtP4VXacYhd_e47+KRW5vdwjS@xxxxxxxxxxxxxx> <20101116185748.GA11549@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Tue, Nov 16, 2010 at 10:57 AM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
>> >> Using the bounce buffers limits the DMA operations to under 32-bit. So 
>> >> could it be that you are using some casting macro that casts a PFN to 
>> >> unsigned long or vice-versa and we end up truncating it to 32-bit? (I've 
>> >> seen this issue actually with InfiniBand drivers back in RHEL5 days..). 
>> >> Lastly, do you set your DMA mask on the device to 32BIT?
>> >>
>> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to 
>> >> set 32-bit physical addr to chip. Others need to set 45-bit physical addr 
>> >> to chip.
>> >
>> > Oh boy. That complicates it.
>> >
>> >> The driver doesn't set DMA mask on the device to 32 bit.
>> >
>> > Is it set then to 45bit?
>> >
>> We were not explicitly setting the DMA mask. pci_alloc_coherent was
> You should. But only once (during startup).
>> always returning 32 bits but pci_map_single was returning a 34-bit
>> address which we truncate by casting it to a uint32_t since the
> Truncating any bus (DMA) address is a big no no.
>> Tachyon's HBA register is only 32 bits. With swiotlb=force, both
> Not knowing the driver I can't comment here much, but
>  1). When you say 'HBA registers' I think PCI MMIO BARs. Those are
>     usually found beneath the 4GB limit and you get the virtual
>     address when doing ioremap (or the pci equivalant). And the
>     bus address is definitly under the 4GB.
>  2). After you have done that, set your pci_dma_mask to 34-bit, and then
>  2). For all other operations where you can do 34-bit use the pci_map
>     _single. The swiotlb buffer looks at the dma_mask (and if there
>     is no set it assumes 32bit), and if it finds the physical address
>     to be within the DMA mask it will gladly translate the physical
>     to bus and nothing else. If however the physical address is way
>     beyound the bus address it will give you the bounce buffer which
>     you will later have to copy from (using pci_sync..). I've written
>     a little blurp at the bottom of the email explaining this in more details.
> Or is the issue that when you write to your HBA register the DMA
> address, the HBA register can _only_ deal with 32-bit values (4bytes)?

The HBA register which is using the address returned by pci_map_single
is limited to a 32-bit value.

> In which case the PCI device seems to be limited to addressing only up to 
> 4GB, right?

The HBA has some 32-bit registers and some that are 45-bit.

>> returned 32 bits without explicitly setting the DMA mask. Once we set
>> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
>> with iommu=soft (and no more swiotlb=force), we're still stuck with
>> the abysmal I/O performance (same as when we had swiotlb=force).
> Right, that is expected.

So with iommu=soft, all I/Os have to go through Xen-SWIOTLB which
explains why we're seeing the abysmal I/O performance, right?

Is it true then that with an HVM domU kernel and PCI passthrough, it
does not use Xen-SWIOTLB and therefore results in better performance?

>> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's
>> the default if we don't specify it? Without it, we get no I/Os (it
> If you don't specify it you can't do PCI passthrough in PV guests.
> It is automatically enabled when you boot Linux as Dom0.
>> seems the interrupts and/or DMA don't work).
> It has two purposes:
>  1). The predominant and which is used for both DomU and Dom0 is to
>     translate physical address to machine frame numbers (PFNs->MFNs).
>     Xen PV guests have a P2M array that is consulted when setting
>     virtual addresses (PTEs). For PCI BARs, they are equivalant
>     (PFN == MFN), but for memory regions they can be discontigous,
>     and in decreasing order. If you would traverse the P2M list you
>     could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119.
>     So obviously we need a lookup mechanism to say find for
>     virtual address 0xfffff8000010000 the DMA address (bus address).
>     Naively on baremetal on X86 you could use virt_to_phy which would
>     get you PFN 0x10000. On Xen however, we need to consult the P2M array.
>     For example, for p2m[0x10000], the real machine frame number might 
> 0x102323.
>     So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the
>     machine frame number and returns that (dma address aka bus address). That
>     is the value you tell the HBA to transform from/to.
>     If you don't enable Xen-SWIOTLB, and use the native one (or none at all),
>     you end up programming the PCI driver with bogus data since the bus 
> address you
>     are giving the card does not correspond to the real bus address.
>  2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That
>     MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory 
> Write
>     and PCI Memory Read b/c it only has 32-bit address bits we need some way
>     of still getting the contents of 0x102323 to the PCI card. This is where
>     bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB
>     chunk of space that is underneath the 4GB space - it is also contingous.
>     When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the 
> MFN,
>     and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one 
> it'ss
>     buffers, gives you the MFN of its buffer (say 0x20000) and you program 
> that
>     in the PCI card.  When you get an interrupt from the PCI card, you call
>     pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN 
> 0x20000
>     back on the list of buffers to be used. And now you have in MFN 0x102323 
> the
>     result.
>> Are there any profiling tools you can suggest for domU? I was able to
>> apply Dulloor's xenoprofile patch to our dom0 kernel (
>> but not to xen-pcifront-0.8.2.
> Oh boy. I don't sorry.

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>