[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU
On Tue, May 5, 2020 at 3:34 PM Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote: > > + Boris, Jürgen > > See the crash Roman is seeing booting dom0 on the Raspberry Pi. It is > related to the recent xen dma_ops changes. Possibly the same thing > reported by Peng here: > > https://marc.info/?l=linux-kernel&m=158805976230485&w=2 > > An in-depth analysis below. > > > On Mon, 4 May 2020, Roman Shaposhnik wrote: > > > > [ 2.534292] Unable to handle kernel paging request at virtual > > > > address 000000000026c340 > > > > [ 2.542373] Mem abort info: > > > > [ 2.545257] ESR = 0x96000004 > > > > [ 2.548421] EC = 0x25: DABT (current EL), IL = 32 bits > > > > [ 2.553877] SET = 0, FnV = 0 > > > > [ 2.557023] EA = 0, S1PTW = 0 > > > > [ 2.560297] Data abort info: > > > > [ 2.563258] ISV = 0, ISS = 0x00000004 > > > > [ 2.567208] CM = 0, WnR = 0 > > > > [ 2.570294] [000000000026c340] user address but active_mm is swapper > > > > [ 2.576783] Internal error: Oops: 96000004 [#1] SMP > > > > [ 2.581784] Modules linked in: > > > > [ 2.584950] CPU: 3 PID: 135 Comm: kworker/3:1 Not tainted > > > > 5.6.1-default #9 > > > > [ 2.591970] Hardware name: Raspberry Pi 4 Model B (DT) > > > > [ 2.597256] Workqueue: events deferred_probe_work_func > > > > [ 2.602509] pstate: 60000005 (nZCv daif -PAN -UAO) > > > > [ 2.607431] pc : xen_swiotlb_free_coherent+0x198/0x1d8 > > > > [ 2.612696] lr : dma_free_attrs+0x98/0xd0 > > > > [ 2.616827] sp : ffff800011db3970 > > > > [ 2.620242] x29: ffff800011db3970 x28: 00000000000f7b00 > > > > [ 2.625695] x27: 0000000000001000 x26: ffff000037b68410 > > > > [ 2.631129] x25: 0000000000000001 x24: 00000000f7b00000 > > > > [ 2.636583] x23: 0000000000000000 x22: 0000000000000000 > > > > [ 2.642017] x21: ffff800011b0d000 x20: ffff80001179b548 > > > > [ 2.647461] x19: ffff000037b68410 x18: 0000000000000010 > > > > [ 2.652905] x17: ffff000035d66a00 x16: 00000000deadbeef > > > > [ 2.658348] x15: ffffffffffffffff x14: ffff80001179b548 > > > > [ 2.663792] x13: ffff800091db37b7 x12: ffff800011db37bf > > > > [ 2.669236] x11: ffff8000117c7000 x10: ffff800011db3740 > > > > [ 2.674680] x9 : 00000000ffffffd0 x8 : ffff80001071e980 > > > > [ 2.680124] x7 : 0000000000000132 x6 : ffff80001197a6ab > > > > [ 2.685568] x5 : 0000000000000000 x4 : 0000000000000000 > > > > [ 2.691012] x3 : 00000000f7b00000 x2 : fffffdffffe00000 > > > > [ 2.696465] x1 : 000000000026c340 x0 : 000002000046c340 > > > > [ 2.701899] Call trace: > > > > [ 2.704452] xen_swiotlb_free_coherent+0x198/0x1d8 > > > > [ 2.709367] dma_free_attrs+0x98/0xd0 > > > > [ 2.713143] rpi_firmware_property_list+0x1e4/0x240 > > > > [ 2.718146] rpi_firmware_property+0x6c/0xb0 > > > > [ 2.722535] rpi_firmware_probe+0xf0/0x1e0 > > > > [ 2.726760] platform_drv_probe+0x50/0xa0 > > > > [ 2.730879] really_probe+0xd8/0x438 > > > > [ 2.734567] driver_probe_device+0xdc/0x130 > > > > [ 2.738870] __device_attach_driver+0x88/0x108 > > > > [ 2.743434] bus_for_each_drv+0x78/0xc8 > > > > [ 2.747386] __device_attach+0xd4/0x158 > > > > [ 2.751337] device_initial_probe+0x10/0x18 > > > > [ 2.755649] bus_probe_device+0x90/0x98 > > > > [ 2.759590] deferred_probe_work_func+0x88/0xd8 > > > > [ 2.764244] process_one_work+0x1f0/0x3c0 > > > > [ 2.768369] worker_thread+0x138/0x570 > > > > [ 2.772234] kthread+0x118/0x120 > > > > [ 2.775571] ret_from_fork+0x10/0x18 > > > > [ 2.779263] Code: d34cfc00 f2dfbfe2 d37ae400 8b020001 (f8626800) > > > > [ 2.785492] ---[ end trace 4c435212e349f45f ]--- > > > > [ 2.793340] usb 1-1: New USB device found, idVendor=2109, > > > > idProduct=3431, bcdDevice= 4.20 > > > > [ 2.801038] usb 1-1: New USB device strings: Mfr=0, Product=1, > > > > SerialNumber=0 > > > > [ 2.808297] usb 1-1: Product: USB2.0 Hub > > > > [ 2.813710] hub 1-1:1.0: USB hub found > > > > [ 2.817117] hub 1-1:1.0: 4 ports detected > > > > > > > > This is bailing out right here: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/raspberrypi.c?h=v5.6.1#n125 > > > > > > > > FYIW (since I modified the source to actually print what was returned > > > > right before it bails) we get: > > > > buf[1] == 0x800000004 > > > > buf[2] == 0x00000001 > > > > > > > > Status 0x800000004 is of course suspicious since it is not even listed > > > > here: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/soc/bcm2835/raspberrypi-firmware.h#n14 > > > > > > > > So it appears that this DMA request path is somehow busted and it > > > > would be really nice to figure out why. > > > > > > You have actually discovered a genuine bug in the recent xen dma rework > > > in Linux. Congrats :-) > > > > Nice! ;-) > > > > > I am doing some guesswork here, but from what I read in the thread and > > > the information in this email I think this patch might fix the issue. > > > If it doesn't fix the issue please add a few printks in > > > drivers/xen/swiotlb-xen.c:xen_swiotlb_free_coherent and please let me > > > know where exactly it crashes. > > > > > > > > > diff --git a/include/xen/arm/page-coherent.h > > > b/include/xen/arm/page-coherent.h > > > index b9cc11e887ed..ff4677ed9788 100644 > > > --- a/include/xen/arm/page-coherent.h > > > +++ b/include/xen/arm/page-coherent.h > > > @@ -8,12 +8,17 @@ > > > static inline void *xen_alloc_coherent_pages(struct device *hwdev, > > > size_t size, > > > dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs) > > > { > > > + void *cpu_addr; > > > + if (dma_alloc_from_dev_coherent(hwdev, size, dma_handle, > > > &cpu_addr)) > > > + return cpu_addr; > > > return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs); > > > } > > > > > > static inline void xen_free_coherent_pages(struct device *hwdev, size_t > > > size, > > > void *cpu_addr, dma_addr_t dma_handle, unsigned long > > > attrs) > > > { > > > + if (dma_release_from_dev_coherent(hwdev, get_order(size), > > > cpu_addr)) > > > + return; > > > dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs); > > > } > > > > Applied the patch, but it didn't help and after printk's it turns out > > it surprisingly crashes right inside this (rather convoluted if you > > ask me) if statement: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/xen/swiotlb-xen.c?h=v5.6.1#n349 > > > > So it makes sense that the patch didn't help -- we never hit that > > xen_free_coherent_pages. > > The crash happens here: > > if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > range_straddles_page_boundary(phys, size)) && > TestClearPageXenRemapped(virt_to_page(vaddr))) > xen_destroy_contiguous_region(phys, order); > > I don't know exactly what is causing the crash. Is it the WARN_ON somehow? > Is it TestClearPageXenRemapped? Neither should cause a crash in theory. > > > But I do know that there are problems with that if statement on ARM. It > can trigger for one of the following conditions: > > 1) dev_addr + size - 1 > dma_mask > 2) range_straddles_page_boundary(phys, size) > > > The first condition might happen after bef4d2037d214 because > dma_direct_alloc might not respect the device dma_mask. It is actually a > bug and I would like to keep the WARN_ON for that. The patch I sent > yesterday (https://marc.info/?l=xen-devel&m=158865080224504) should > solve that issue. But Roman is telling us that the crash still persists. > > The second condition is completely normal and not an error on ARM > because dom0 is 1:1 mapped. It is not an issue if the address range is > straddling a page boundary. We certainly shouldn't WARN (or crash). > > So, I suggest something similar to Peng's patch, appended. > > Roman, does it solve your problem? > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index b6d27762c6f8..994ca3a4b653 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -346,9 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t > size, void *vaddr, > /* Convert the size to actually allocated. */ > size = 1UL << (order + XEN_PAGE_SHIFT); > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > - range_straddles_page_boundary(phys, size)) && > - TestClearPageXenRemapped(virt_to_page(vaddr))) > + WARN_ON(dev_addr + size - 1 > dma_mask); > + if (TestClearPageXenRemapped(virt_to_page(vaddr))) > xen_destroy_contiguous_region(phys, order); > > xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs); Ok, so "the plot thickens" it seems ;-) It definitely crashes inside of the TestClearPageXenRemapped(virt_to_page(vaddr)) statement Now, to test what part of it -- I actually unpacked the statement which also allowed me to see what page virt_to_page(vaddr) returns by doing: struct page *page = virt_to_page(vaddr); Turns out the value of the page (as per %px) is 000000000026a340 (which doesn't seem odd to me -- but may be it will to you Stefano) That address, however, doesn't seem to be mapped -- when I try adding a simple: printk(KERN_CRIT "page flags %ld\n", page->flags); it crashes trying to dereference a page. Thanks, Roman.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |