[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU



On 06.05.20 00:34, Stefano Stabellini wrote:
+ Boris, Jürgen

See the crash Roman is seeing booting dom0 on the Raspberry Pi. It is
related to the recent xen dma_ops changes. Possibly the same thing
reported by Peng here:

https://marc.info/?l=linux-kernel&m=158805976230485&w=2

An in-depth analysis below.


On Mon, 4 May 2020, Roman Shaposhnik wrote:
[    2.534292] Unable to handle kernel paging request at virtual
address 000000000026c340
[    2.542373] Mem abort info:
[    2.545257]   ESR = 0x96000004
[    2.548421]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.553877]   SET = 0, FnV = 0
[    2.557023]   EA = 0, S1PTW = 0
[    2.560297] Data abort info:
[    2.563258]   ISV = 0, ISS = 0x00000004
[    2.567208]   CM = 0, WnR = 0
[    2.570294] [000000000026c340] user address but active_mm is swapper
[    2.576783] Internal error: Oops: 96000004 [#1] SMP
[    2.581784] Modules linked in:
[    2.584950] CPU: 3 PID: 135 Comm: kworker/3:1 Not tainted 5.6.1-default #9
[    2.591970] Hardware name: Raspberry Pi 4 Model B (DT)
[    2.597256] Workqueue: events deferred_probe_work_func
[    2.602509] pstate: 60000005 (nZCv daif -PAN -UAO)
[    2.607431] pc : xen_swiotlb_free_coherent+0x198/0x1d8
[    2.612696] lr : dma_free_attrs+0x98/0xd0
[    2.616827] sp : ffff800011db3970
[    2.620242] x29: ffff800011db3970 x28: 00000000000f7b00
[    2.625695] x27: 0000000000001000 x26: ffff000037b68410
[    2.631129] x25: 0000000000000001 x24: 00000000f7b00000
[    2.636583] x23: 0000000000000000 x22: 0000000000000000
[    2.642017] x21: ffff800011b0d000 x20: ffff80001179b548
[    2.647461] x19: ffff000037b68410 x18: 0000000000000010
[    2.652905] x17: ffff000035d66a00 x16: 00000000deadbeef
[    2.658348] x15: ffffffffffffffff x14: ffff80001179b548
[    2.663792] x13: ffff800091db37b7 x12: ffff800011db37bf
[    2.669236] x11: ffff8000117c7000 x10: ffff800011db3740
[    2.674680] x9 : 00000000ffffffd0 x8 : ffff80001071e980
[    2.680124] x7 : 0000000000000132 x6 : ffff80001197a6ab
[    2.685568] x5 : 0000000000000000 x4 : 0000000000000000
[    2.691012] x3 : 00000000f7b00000 x2 : fffffdffffe00000
[    2.696465] x1 : 000000000026c340 x0 : 000002000046c340
[    2.701899] Call trace:
[    2.704452]  xen_swiotlb_free_coherent+0x198/0x1d8
[    2.709367]  dma_free_attrs+0x98/0xd0
[    2.713143]  rpi_firmware_property_list+0x1e4/0x240
[    2.718146]  rpi_firmware_property+0x6c/0xb0
[    2.722535]  rpi_firmware_probe+0xf0/0x1e0
[    2.726760]  platform_drv_probe+0x50/0xa0
[    2.730879]  really_probe+0xd8/0x438
[    2.734567]  driver_probe_device+0xdc/0x130
[    2.738870]  __device_attach_driver+0x88/0x108
[    2.743434]  bus_for_each_drv+0x78/0xc8
[    2.747386]  __device_attach+0xd4/0x158
[    2.751337]  device_initial_probe+0x10/0x18
[    2.755649]  bus_probe_device+0x90/0x98
[    2.759590]  deferred_probe_work_func+0x88/0xd8
[    2.764244]  process_one_work+0x1f0/0x3c0
[    2.768369]  worker_thread+0x138/0x570
[    2.772234]  kthread+0x118/0x120
[    2.775571]  ret_from_fork+0x10/0x18
[    2.779263] Code: d34cfc00 f2dfbfe2 d37ae400 8b020001 (f8626800)
[    2.785492] ---[ end trace 4c435212e349f45f ]---
[    2.793340] usb 1-1: New USB device found, idVendor=2109,
idProduct=3431, bcdDevice= 4.20
[    2.801038] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    2.808297] usb 1-1: Product: USB2.0 Hub
[    2.813710] hub 1-1:1.0: USB hub found
[    2.817117] hub 1-1:1.0: 4 ports detected

This is bailing out right here:
      
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/raspberrypi.c?h=v5.6.1#n125

FYIW (since I modified the source to actually print what was returned
right before it bails) we get:
    buf[1] == 0x800000004
    buf[2] == 0x00000001

Status 0x800000004 is of course suspicious since it is not even listed here:
     
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/soc/bcm2835/raspberrypi-firmware.h#n14

So it appears that this DMA request path is somehow busted and it
would be really nice to figure out why.

You have actually discovered a genuine bug in the recent xen dma rework
in Linux. Congrats :-)

Nice! ;-)

I am doing some guesswork here, but from what I read in the thread and
the information in this email I think this patch might fix the issue.
If it doesn't fix the issue please add a few printks in
drivers/xen/swiotlb-xen.c:xen_swiotlb_free_coherent and please let me
know where exactly it crashes.


diff --git a/include/xen/arm/page-coherent.h b/include/xen/arm/page-coherent.h
index b9cc11e887ed..ff4677ed9788 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/arm/page-coherent.h
@@ -8,12 +8,17 @@
  static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t 
size,
                 dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
  {
+       void *cpu_addr;
+       if (dma_alloc_from_dev_coherent(hwdev, size, dma_handle, &cpu_addr))
+               return cpu_addr;
         return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
  }

  static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
                 void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
  {
+       if (dma_release_from_dev_coherent(hwdev, get_order(size), cpu_addr))
+               return;
         dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
  }

Applied the patch, but it didn't help and after printk's it turns out
it surprisingly crashes right inside this (rather convoluted if you
ask me) if statement:
     
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/xen/swiotlb-xen.c?h=v5.6.1#n349

So it makes sense that the patch didn't help -- we never hit that
xen_free_coherent_pages.
The crash happens here:

        if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
                     range_straddles_page_boundary(phys, size)) &&
            TestClearPageXenRemapped(virt_to_page(vaddr)))
                xen_destroy_contiguous_region(phys, order);

I don't know exactly what is causing the crash. Is it the WARN_ON somehow?
Is it TestClearPageXenRemapped? Neither should cause a crash in theory.


But I do know that there are problems with that if statement on ARM. It
can trigger for one of the following conditions:

1) dev_addr + size - 1 > dma_mask
2) range_straddles_page_boundary(phys, size)


The first condition might happen after bef4d2037d214 because
dma_direct_alloc might not respect the device dma_mask. It is actually a
bug and I would like to keep the WARN_ON for that. The patch I sent
yesterday (https://marc.info/?l=xen-devel&m=158865080224504) should
solve that issue. But Roman is telling us that the crash still persists.

The second condition is completely normal and not an error on ARM
because dom0 is 1:1 mapped. It is not an issue if the address range is
straddling a page boundary. We certainly shouldn't WARN (or crash).

So, I suggest something similar to Peng's patch, appended.

Roman, does it solve your problem?


diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b6d27762c6f8..994ca3a4b653 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -346,9 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
        /* Convert the size to actually allocated. */
        size = 1UL << (order + XEN_PAGE_SHIFT);
- if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
-                    range_straddles_page_boundary(phys, size)) &&
-           TestClearPageXenRemapped(virt_to_page(vaddr)))
+       WARN_ON(dev_addr + size - 1 > dma_mask);
+       if (TestClearPageXenRemapped(virt_to_page(vaddr)))
                xen_destroy_contiguous_region(phys, order);

This is a very bad idea for x86. Calling xen_destroy_contiguous_region()
for cases where it shouldn't be called is causing subtle bugs in memory
management, like pages being visible twice and thus resulting in weird
memory inconsistencies.

What you want here is probably an architecture specific test.


Juergen



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.