[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)

On Thu, Oct 01, 2009 at 12:07:22PM -0700, Jeremy Fitzhardinge wrote:
> On 10/01/09 10:35, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 30, 2009 at 11:21:49AM -0400, Konrad Rzeszutek Wilk wrote:
> >   
> >> With those two options defined in the boot line, the radeon kernel driver 
> >> spits out:
> >>
> >> [  244.190885] [drm] Loading RV730/RV740 PFP Microcode
> >> [  244.190911] [drm] Loading RV730/RV740 CP Microcode
> >> [  244.205974] [drm] Resetting GPU
> >> [  244.310103] [drm] writeback test failed
> >> [  251.220092] [drm] Resetting GPU
> >>
> >> And the video is useless (red little characters at the top, nothing else).
> >> If I don't define those two arguments the video driver works fine.
> >>     
> > .. snip ..
> >   
> >> Right now I am instrumenting the code and trying to narrow
> >> down the issue. But I was wondering if somebody has seen this in the past
> >> and can say "Oh yeah, I saw that back in 1980, try this."
> >>     
> > I found the fault, which I am going to try to describe. 
> >
> > The radeon_dri driver (user-land) via ioctl, calls 'drm_sg_alloc'.
> > 'drm_sg_alloc' allocates a 32MB area using 'vmalloc_32'. On non-Xen
> > machines, the physical address is indeed under the 4GB number. On Xen
> > thought the physical address is somewhere in the stratosphere. The 
> > radeon_dri
> > saves this vmalloc-ed virtual address in its structures.
> >
> > Next step is bit more complex. radeon_dri via ioctl calls the radeon_cp_init
> > which job is to initialize the command processing engine. The passed
> > in arguments are the virtual address obtained via 'drm_sg_alloc'. That 32MB
> > at that point is partitioned (the first 1MB is for the command processing 
> > ring,
> > the next 64KB for a ring pointer, and so on) and both the user-land driver
> > and the kernel save this address in their structures. Keep in mind that this
> > address is the virtual address pointing to the vmalloc-ed area. Not the
> > virtual address obtained from page_address(vmalloc_to_page(i))). 
> >
> > Then the radeon_cp_init sets up a GPU page-table using the physical
> > addresses.  This is where it starts to fall apart, as during this
> > setup, the r600_page_table_init calls the pci_map_page to ensure that the
> > physical address is below 4GB. On Xen (and on non-Xen with swiotlb=force
> > iommu=soft), the physical addresses obtained end up being taken
> > from the IOMMU. This is important. 
> >   
> (By IOMMU, do you mean swiotlb?  Or via dma_ops?)

dma_ops. On non-Xen I forced it to use the swiotlb.

> There's no particular reason why this needs to be swiotlb'd. 

Absolutely. Especially as video driver programs the graphic card
page-table to do the translations - it is an IOMMU by itself!

> pci_map_page should be calling xen_map_page, which should be calling
> xen_create_contiguous_region() on it to make sure the underlying memory
> is phyisically in the right place.  The tricky part is making sure all
> the kernel/vmalloc aliases are dealt with properly (and hope there are
> no usermode mappings at that point, or we'll have to start rummaging
> through rmap).
> > After that the writeback is done on the MMIO region, the GPU
> > happily looks at the page-table, finds the physicall address (which
> > is _not_ pointing to the vmalloc area, but to the IOMMU) and does a
> > write to that memory region. While the writeback test reads data from
> > saved ring_rptr (which reads memory from the vmalloc area - for which
> > the GPU has no page-table). 
> >   
> That means the radeon driver is misusing the DMA API.  If the page is
> mapped to the device you can't start accessing it from the CPU without
> unmapping it, or at least syncing it.

Well, it is a bit schizophrenic. The driver sets up the GPU page-table
for the 32-bit compatible DMA space using the pci_map_page, and that is what
the GPU uses. But the kernel radeon driver assumes that the array of 
vmalloc virtual address map to the same exact physicall addresses
of the page as what was passed to pci_map_page. In other words, what
was passed in to pci_map_page is assumed to do a "return page_to_phys(addr);'
 operation and nothing more. This is by the virtue that drm_sg_alloc calls
'vmalloc_32' which is suppose to give a 32-bit accessible set of memory.

> > 2). I experiemented a bit with drm_sg_alloc to use the dma_alloc_coherent
> > but 32MB is way too big for it:
> >
> >        if (order >= MAX_ORDER) {
> >                 WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >
> > (MAX_ORDER is defined to be 11, and 32MB is order 15).
> >   
> Could modify drm_vmalloc_dma to do the vmalloc "manually":
>    1. call __get_vm_area to reserve a chunk of vmalloc address space
>    2. allocate a bunch of individual pages with dma_alloc_coherent
>    3. insert them into the vmalloc mapping with map_vm_area
> That will guarantee a normal-looking vmalloc area with device-friendly
> pages that subsequent pci_map_page operations will use as-is.

That is an idea I hadn't thought off. Let me try that.

.. snip ..
> > 3). Ensuring the sg_drm_alloc will get an area with physical addresses
> >    under the 4G mark. I was thinking to utlilize the 
> > xen_create_contigous_region,
> >    but the PFNs returned by vmalloc_32 are actually MFNs, so
> >    xen_create_contigous_region is out of the picture.
> >   
> Not sure I follow you here.  But I think this is the right approach,
> using the algorithm above.

True. Let me try that.
> > 4). Go a further step back. Ensure that vmalloc_32 will always get 
> >     addresses under the 4GB mark? Since the virt_to_pfn for these addresses
> >     are the real physical addresses, perhaps make this function enforce
> >     the MFNs to be under 4GB mark?
> >   
> I was thinking along those lines, but its hard to see how to do this
> without mucking around core vmalloc code.

The are other users of 'vmalloc_32' that look like they depend on this
memory being under the 4GB mark. Most of them are do video capture through
USB - so it probably is limited to only accessing up to 4GB.

>     J

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.