[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)

On 10/01/09 10:35, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 30, 2009 at 11:21:49AM -0400, Konrad Rzeszutek Wilk wrote:
>> With those two options defined in the boot line, the radeon kernel driver 
>> spits out:
>> [  244.190885] [drm] Loading RV730/RV740 PFP Microcode
>> [  244.190911] [drm] Loading RV730/RV740 CP Microcode
>> [  244.205974] [drm] Resetting GPU
>> [  244.310103] [drm] writeback test failed
>> [  251.220092] [drm] Resetting GPU
>> And the video is useless (red little characters at the top, nothing else).
>> If I don't define those two arguments the video driver works fine.
> .. snip ..
>> Right now I am instrumenting the code and trying to narrow
>> down the issue. But I was wondering if somebody has seen this in the past
>> and can say "Oh yeah, I saw that back in 1980, try this."
> I found the fault, which I am going to try to describe. 
> The radeon_dri driver (user-land) via ioctl, calls 'drm_sg_alloc'.
> 'drm_sg_alloc' allocates a 32MB area using 'vmalloc_32'. On non-Xen
> machines, the physical address is indeed under the 4GB number. On Xen
> thought the physical address is somewhere in the stratosphere. The radeon_dri
> saves this vmalloc-ed virtual address in its structures.
> Next step is bit more complex. radeon_dri via ioctl calls the radeon_cp_init
> which job is to initialize the command processing engine. The passed
> in arguments are the virtual address obtained via 'drm_sg_alloc'. That 32MB
> at that point is partitioned (the first 1MB is for the command processing 
> ring,
> the next 64KB for a ring pointer, and so on) and both the user-land driver
> and the kernel save this address in their structures. Keep in mind that this
> address is the virtual address pointing to the vmalloc-ed area. Not the
> virtual address obtained from page_address(vmalloc_to_page(i))). 
> Then the radeon_cp_init sets up a GPU page-table using the physical
> addresses.  This is where it starts to fall apart, as during this
> setup, the r600_page_table_init calls the pci_map_page to ensure that the
> physical address is below 4GB. On Xen (and on non-Xen with swiotlb=force
> iommu=soft), the physical addresses obtained end up being taken
> from the IOMMU. This is important. 
(By IOMMU, do you mean swiotlb?  Or via dma_ops?)

There's no particular reason why this needs to be swiotlb'd. 
pci_map_page should be calling xen_map_page, which should be calling
xen_create_contiguous_region() on it to make sure the underlying memory
is phyisically in the right place.  The tricky part is making sure all
the kernel/vmalloc aliases are dealt with properly (and hope there are
no usermode mappings at that point, or we'll have to start rummaging
through rmap).

> After that the writeback is done on the MMIO region, the GPU
> happily looks at the page-table, finds the physicall address (which
> is _not_ pointing to the vmalloc area, but to the IOMMU) and does a
> write to that memory region. While the writeback test reads data from
> saved ring_rptr (which reads memory from the vmalloc area - for which
> the GPU has no page-table). 

That means the radeon driver is misusing the DMA API.  If the page is
mapped to the device you can't start accessing it from the CPU without
unmapping it, or at least syncing it.

> Solutions?:
> 1). The current workaround is to boot Xen with mem=3GB which will
> enforce the vmalloc area is under 4GB.
> 2). I experiemented a bit with drm_sg_alloc to use the dma_alloc_coherent
> but 32MB is way too big for it:
>        if (order >= MAX_ORDER) {
>                 WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> (MAX_ORDER is defined to be 11, and 32MB is order 15).

Could modify drm_vmalloc_dma to do the vmalloc "manually":

   1. call __get_vm_area to reserve a chunk of vmalloc address space
   2. allocate a bunch of individual pages with dma_alloc_coherent
   3. insert them into the vmalloc mapping with map_vm_area

That will guarantee a normal-looking vmalloc area with device-friendly
pages that subsequent pci_map_page operations will use as-is.

> 3). Restructure r600_cp_init to refresh its structures when the page
>    table uses different areas than the physical addresses. That would
>    imply keeping a copy of the "old" vmalloc unused virtual addresses
>    (the user-land driver uses the virtual address as handle) and updating
>    the ring pointer, command pointer, etc to use the virtual addresses
>    obtained from doing phys_to_virt(page_to_phys(entry->pagelist[X])).
>    But the code assumes (and righly) that the area obtained from
>    drm_sg_alloc is 32-bit addressable, so..
Ugh.  I think we could make the argument that this driver is broken for
not syncing its mapped pages before accessing them - but if you're doing
graphics via swiotlb you're in a world of pain anyway.  And we want a
more general fix because other drivers are likely to be doing equally
dubious things.

> 3). Ensuring the sg_drm_alloc will get an area with physical addresses
>    under the 4G mark. I was thinking to utlilize the 
> xen_create_contigous_region,
>    but the PFNs returned by vmalloc_32 are actually MFNs, so
>    xen_create_contigous_region is out of the picture.

Not sure I follow you here.  But I think this is the right approach,
using the algorithm above.

> 4). Go a further step back. Ensure that vmalloc_32 will always get 
>     addresses under the 4GB mark? Since the virt_to_pfn for these addresses
>     are the real physical addresses, perhaps make this function enforce
>     the MFNs to be under 4GB mark?

I was thinking along those lines, but its hard to see how to do this
without mucking around core vmalloc code.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.