[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen-swiotlb issue when NVMe driver is enabled in Dom0 on ARM



On Thu, 21 Apr 2022, Rahul Singh wrote:
> > On 20 Apr 2022, at 11:46 pm, Stefano Stabellini <sstabellini@xxxxxxxxxx> 
> > wrote:
> > On Wed, 20 Apr 2022, Rahul Singh wrote:
> >>> On 20 Apr 2022, at 3:36 am, Stefano Stabellini <sstabellini@xxxxxxxxxx> 
> >>> wrote:
> >>>>> Then there is xen_swiotlb_init() which allocates some memory for
> >>>>> swiotlb-xen at boot. It could lower the total amount of memory
> >>>>> available, but if you disabled swiotlb-xen like I suggested,
> >>>>> xen_swiotlb_init() still should get called and executed anyway at boot
> >>>>> (it is called from arch/arm/xen/mm.c:xen_mm_init). So xen_swiotlb_init()
> >>>>> shouldn't be the one causing problems.
> >>>>> 
> >>>>> That's it -- there is nothing else in swiotlb-xen that I can think of.
> >>>>> 
> >>>>> I don't have any good ideas, so I would only suggest to add more printks
> >>>>> and report the results, for instance:
> >>>> 
> >>>> As suggested I added the more printks but only difference I see is the 
> >>>> size apart
> >>>> from that everything looks same .
> >>>> 
> >>>> Please find the attached logs for xen and native linux boot.
> >>> 
> >>> One difference is that the order of the allocations is significantly
> >>> different after the first 3 allocations. It is very unlikely but
> >>> possible that this is an unrelated concurrency bug that only occurs on
> >>> Xen. I doubt it.
> >> 
> >> I am not sure but just to confirm with you, I see below logs in every 
> >> scenario.
> >> SWIOTLB memory allocated by linux swiotlb and used by xen-swiotlb. Is that 
> >> okay or it can cause some issue.
> >> 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] software IO TLB: mapped [mem 
> >> 0x00000000f4000000-0x00000000f8000000] (64MB)
> >> 
> >> snip from int __ref xen_swiotlb_init(int verbose, bool early)
> >> /*                                                                         
> >>      * IO TLB memory already allocated. Just use it.                       
> >>     
> >>      */                                                                    
> >>     
> >>     if (io_tlb_start != 0) {                                               
> >>     
> >>         xen_io_tlb_start = phys_to_virt(io_tlb_start);                     
> >>     
> >>         goto end;                                                          
> >>     
> >>     }
> > 
> > Unfortunately there is nothing obvious in the logs. I think we need to
> > look at the in-details executions of Linux on Xen with swiotlb-xen and
> > Linux on Xen without swiotlb-xen. The comparison with Linux on native is
> > not very interesting because the memory layout is a bit different.
> > 
> > The comparison between the two executions should be simple because
> > swiotlb-xen should be transparent: in this simple case swiotlb-xen
> > should end up calling always the same functions that would end up being
> > called anyway without swiotlb-xen. Basically, it should only add a
> > couple of extra steps in between, nothing else.
> > 
> > As we have already discussed:
> > 
> > - [no swiotlb-xen] dma_alloc_attrs --> dma_direct_alloc
> > - [swiotlb-xen] dma_alloc_attrs --> xen_swiotlb_alloc_coherent --> 
> > dma_direct_alloc
> > 
> > The result should be identical. In xen_swiotlb_alloc_coherent the code
> > path taken should be:
> > 
> > - xen_alloc_coherent_pages
> > - if (((dev_addr + size - 1 <= dma_mask)) &&
> >      !range_straddles_page_boundary(phys, size)) {
> >      *dma_handle = dev_addr;
> > - return ret
> > 
> > So basically, it should make zero difference. That is expected because
> > swiotlb-xen really only comes into play for domU pages. For booting
> > dom0, it should only be a "useless" indirection.
> > 
> > In the case of xen_swiotlb_map_page, it should be similar. The path
> > taken should be:
> > 
> >     if (dma_capable(dev, dev_addr, size, true) &&
> >         !range_straddles_page_boundary(phys, size) &&
> >             !xen_arch_need_swiotlb(dev, phys, dev_addr) &&
> >             swiotlb_force != SWIOTLB_FORCE)
> >             goto done;
> > 
> > which I think should correspond to this prints in your logs at line 400:
> > 
> >    DEBUG xen_swiotlb_map_page 400 phys=80003c4f000 dev_addr=80003c4f000
> > 
> > So that should be OK too. If different paths are taken, then we have a
> > problem. If the paths above are taken there should be zero difference
> > between the swiotlb-xen and the non-swiotlb-xen cases.
> > 
> > Which brings me to your question about xen_swiotlb_init and this
> > message:
> > 
> >    software IO TLB: mapped [mem 0x00000000f4000000-0x00000000f8000000] 
> > (64MB)
> > 
> > The swiotlb-xen buffer should *not* be used if the code paths taken are
> > the ones above. So it doesn't matter if it is allocated or not. You
> > could comment out the code in xen_swiotlb_init and everything should
> > still behave the same.
> > 
> > Finally, my suggestion. Considering all the above, I would look *very*
> > closely at the execution of Linux on Xen with and without swiotlb-xen.
> > The differences should be really minimal. Adds prints to all the
> > swiotlb-xen functions, but really only the following should matter:
> > - xen_swiotlb_alloc_coherent
> > - xen_swiotlb_map_page
> > - xen_swiotlb_unmap_page
> > 
> > What are the differences between the two executions? From the logs:
> > 
> > - the allocation of the swiotlb-xen buffer which leads to 64MB of less
> >  memory available, but actually if you compared to Linux on Xen
> >  with/without swiotlb-xen this different would go away because
> >  xen_swiotlb_init would be called in both cases anyway
> > 
> > - the size upgrade in xen_swiotlb_alloc_coherent: I can see several
> >  instances of the allocation size being increased. Is that causing the
> >  problem? It seems unlikely and you have already verified it is not the
> >  case by removing the size increase in xen_swiotlb_alloc_coherent
> > 
> > - What else is different? There *must* be something, but it is not
> >  showing in the logs so far.
> > 
> > 
> > The only other observation that I have, but it doesn't help, is that the
> > failure happens on the second 4MB allocation when there is another
> > concurrent memory allocation of 4K. Neither the 4MB nor the 4K are
> > size-upgrades by xen_swiotlb_alloc_coherent.
> > 
> > 4MB is an larger-than-usual size, but it shouldn't make that much of a
> > difference. Is that problem that the 4MB have to be contiguous? I don't
> > see how swiotlb-xen could have an impact in that regard, if not for the
> > size increase in xen_swiotlb_alloc_coherent.
> > 
> > Please let me know what you find.
> 
> I debug the issue more today and found out that the only difference when
> calling dma_alloc_attrs() from the NVMe driver [1] and the other driver is the
> attribute “DMA_ATTR_NO_KERNEL_MAPPING". 

This is progress!


> I remove the attribute "DMA_ATTR_NO_KERNEL_MAPPING” before
> calling the xen_alloc_coherent_pages() , NVMe DMA allocation is successful
> and the issue is not observed.
> 
> Do you have any idea why attribute DMA_ATTR_NO_KERNEL_MAPPING is
> causing the the issue with xen-swiotlb.

Yes, if you look at xen_swiotlb_free_coherent, it is clear that things
won't work for the DMA_ATTR_NO_KERNEL_MAPPING case. Can you have a try
at the patch below?

---
swiotlb-xen: handle DMA_ATTR_NO_KERNEL_MAPPING

If DMA_ATTR_NO_KERNEL_MAPPING is set then the returned vaddr is a struct
*page instead of the virtual mapping of the buffer.

In xen_swiotlb_alloc_coherent, do not call virt_to_page, instead use the
returned pointer directly. Also do not memset the buffer or struct page
to zero.

In xen_swiotlb_free_coherent, check DMA_ATTR_NO_KERNEL_MAPPING and set
the page pointer appropriately.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 2b385c1b4a99..22d8779d3ac0 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -318,15 +318,21 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t 
size,
            !range_straddles_page_boundary(phys, size))
                *dma_handle = dev_addr;
        else {
+               struct page *page;
+
                if (xen_create_contiguous_region(phys, order,
                                                 fls64(dma_mask), dma_handle) 
!= 0) {
                        xen_free_coherent_pages(hwdev, size, ret, 
(dma_addr_t)phys, attrs);
                        return NULL;
                }
                *dma_handle = phys_to_dma(hwdev, *dma_handle);
-               SetPageXenRemapped(virt_to_page(ret));
+
+               if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
+                       page = ret;
+               else
+                       virt_to_page(ret);
+               SetPageXenRemapped(page);
        }
-       memset(ret, 0, size);
        return ret;
 }
 
@@ -349,7 +355,9 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
        /* Convert the size to actually allocated. */
        size = 1UL << (order + XEN_PAGE_SHIFT);
 
-       if (is_vmalloc_addr(vaddr))
+       if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
+               page = vaddr;
+       else if (is_vmalloc_addr(vaddr))
                page = vmalloc_to_page(vaddr);
        else
                page = virt_to_page(vaddr);

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.