[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU


  • To: Jürgen Groß <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roman Shaposhnik <roman@xxxxxxxxxx>, "boris.ostrovsky@xxxxxxxxxx" <boris.ostrovsky@xxxxxxxxxx>
  • From: Peng Fan <peng.fan@xxxxxxx>
  • Date: Wed, 6 May 2020 08:54:44 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nxp.com; dmarc=pass action=none header.from=nxp.com; dkim=pass header.d=nxp.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=diqS4PCpcmyNey4bv0bHlyYUExILkB6FLm8ijjN2J3Q=; b=S+67vkNIW8k2k9XJR9aI0yUX4Uehr/DO2Zhl3ShN6nxt41DzhOSNLqrAiPcidtQTRE7HjZCpz0SB3r3agdsLzeEO73xM+t0BU9Ljx21nFyRudVWoycFvT/fPeBXeu/ERINupfkTQ61AJnUy+F3OYoTRmdC4KgCTl2f/Bb3qm7nXlQphltmji43foyD17RQ7tEerc3qzgZ6SPxmv2B0NQe8zopqQ/NmI8/j/j6HLD7hOcvVh0oA6WY2uwJiXp14MTyOpJwzH44W6SdR8Zj/qFVAMvQpZHkO7UZwHt6nvCeZWQpue/QmLJu4t0zkC+yU8RrOEnlTIObjlqav4gx3TP5Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J7/kZLlwnOZLqjLjrkXLcxFU0NE3TKPV4QBNa2TYZNftcczB1iWIgt29QSDFn522KunPSkbezHrjbVg+kiyryCLLZDGQQahN6IFW9i6ZJDDtdL6r3YoSKp4f0dDBCR3WZyDFZRPULHudWFVM3V6jL7fsuJZduwYPfh5gUnG2SydASHkeTIud3U9RH42jzqB4ryj0JmYbxVmz7Z58XM9Z4NQ4o48uyiCitAsd8APTgD4nqtDgtnu63o2kHg6ycF65jsHxnm1yY8yoMhginfS39NSvMuoi4NaB2GaXAHLcICfcntNZiX+2A3AMCkuendNQ9W6xL8jbkIyjZWvBKEuJwA==
  • Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=nxp.com;
  • Cc: Julien Grall <julien@xxxxxxx>, "minyard@xxxxxxx" <minyard@xxxxxxx>, "jeff.kubascik@xxxxxxxxxxxxxxx" <jeff.kubascik@xxxxxxxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>
  • Delivery-date: Wed, 06 May 2020 08:55:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWH18dQjCrgLweTkufDj1dUDc6u6iTHE+AgADgr4CAABO6gIAAnxoAgABhlICAAAOggIACz9iAgACIxQCAAHTCgIAAHTyAgAEcYgCAAHdcAIAANOLw
  • Thread-topic: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU

> Subject: Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU
> 
> On 06.05.20 00:34, Stefano Stabellini wrote:
> > + Boris, Jürgen
> >
> > See the crash Roman is seeing booting dom0 on the Raspberry Pi. It is
> > related to the recent xen dma_ops changes. Possibly the same thing
> > reported by Peng here:

Yes. It is same issue.

> >
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarc
> > .info%2F%3Fl%3Dlinux-kernel%26m%3D158805976230485%26w%3D2&am
> p;data=02%
> >
> 7C01%7Cpeng.fan%40nxp.com%7Cab98b2db94484141a8ad08d7f1803372%7
> C686ea1d
> >
> 3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637243405233572354&amp;sd
> ata=0Yr5h
> > Rg4%2FuApsBpTwIIL4StZU%2FUA55oP5Lfnfmtj4Hg%3D&amp;reserved=0
> >
> > An in-depth analysis below.
> >
> >
> > On Mon, 4 May 2020, Roman Shaposhnik wrote:
> >>>> [    2.534292] Unable to handle kernel paging request at virtual
> >>>> address 000000000026c340
> >>>> [    2.542373] Mem abort info:
> >>>> [    2.545257]   ESR = 0x96000004
> >>>> [    2.548421]   EC = 0x25: DABT (current EL), IL = 32 bits
> >>>> [    2.553877]   SET = 0, FnV = 0
> >>>> [    2.557023]   EA = 0, S1PTW = 0
> >>>> [    2.560297] Data abort info:
> >>>> [    2.563258]   ISV = 0, ISS = 0x00000004
> >>>> [    2.567208]   CM = 0, WnR = 0
> >>>> [    2.570294] [000000000026c340] user address but active_mm is
> swapper
> >>>> [    2.576783] Internal error: Oops: 96000004 [#1] SMP
> >>>> [    2.581784] Modules linked in:
> >>>> [    2.584950] CPU: 3 PID: 135 Comm: kworker/3:1 Not tainted
> 5.6.1-default #9
> >>>> [    2.591970] Hardware name: Raspberry Pi 4 Model B (DT)
> >>>> [    2.597256] Workqueue: events deferred_probe_work_func
> >>>> [    2.602509] pstate: 60000005 (nZCv daif -PAN -UAO)
> >>>> [    2.607431] pc : xen_swiotlb_free_coherent+0x198/0x1d8
> >>>> [    2.612696] lr : dma_free_attrs+0x98/0xd0
> >>>> [    2.616827] sp : ffff800011db3970
> >>>> [    2.620242] x29: ffff800011db3970 x28: 00000000000f7b00
> >>>> [    2.625695] x27: 0000000000001000 x26: ffff000037b68410
> >>>> [    2.631129] x25: 0000000000000001 x24: 00000000f7b00000
> >>>> [    2.636583] x23: 0000000000000000 x22: 0000000000000000
> >>>> [    2.642017] x21: ffff800011b0d000 x20: ffff80001179b548
> >>>> [    2.647461] x19: ffff000037b68410 x18: 0000000000000010
> >>>> [    2.652905] x17: ffff000035d66a00 x16: 00000000deadbeef
> >>>> [    2.658348] x15: ffffffffffffffff x14: ffff80001179b548
> >>>> [    2.663792] x13: ffff800091db37b7 x12: ffff800011db37bf
> >>>> [    2.669236] x11: ffff8000117c7000 x10: ffff800011db3740
> >>>> [    2.674680] x9 : 00000000ffffffd0 x8 : ffff80001071e980
> >>>> [    2.680124] x7 : 0000000000000132 x6 : ffff80001197a6ab
> >>>> [    2.685568] x5 : 0000000000000000 x4 : 0000000000000000
> >>>> [    2.691012] x3 : 00000000f7b00000 x2 : fffffdffffe00000
> >>>> [    2.696465] x1 : 000000000026c340 x0 : 000002000046c340
> >>>> [    2.701899] Call trace:
> >>>> [    2.704452]  xen_swiotlb_free_coherent+0x198/0x1d8
> >>>> [    2.709367]  dma_free_attrs+0x98/0xd0
> >>>> [    2.713143]  rpi_firmware_property_list+0x1e4/0x240
> >>>> [    2.718146]  rpi_firmware_property+0x6c/0xb0
> >>>> [    2.722535]  rpi_firmware_probe+0xf0/0x1e0
> >>>> [    2.726760]  platform_drv_probe+0x50/0xa0
> >>>> [    2.730879]  really_probe+0xd8/0x438
> >>>> [    2.734567]  driver_probe_device+0xdc/0x130
> >>>> [    2.738870]  __device_attach_driver+0x88/0x108
> >>>> [    2.743434]  bus_for_each_drv+0x78/0xc8
> >>>> [    2.747386]  __device_attach+0xd4/0x158
> >>>> [    2.751337]  device_initial_probe+0x10/0x18
> >>>> [    2.755649]  bus_probe_device+0x90/0x98
> >>>> [    2.759590]  deferred_probe_work_func+0x88/0xd8
> >>>> [    2.764244]  process_one_work+0x1f0/0x3c0
> >>>> [    2.768369]  worker_thread+0x138/0x570
> >>>> [    2.772234]  kthread+0x118/0x120
> >>>> [    2.775571]  ret_from_fork+0x10/0x18
> >>>> [    2.779263] Code: d34cfc00 f2dfbfe2 d37ae400 8b020001
> (f8626800)
> >>>> [    2.785492] ---[ end trace 4c435212e349f45f ]---
> >>>> [    2.793340] usb 1-1: New USB device found, idVendor=2109,
> >>>> idProduct=3431, bcdDevice= 4.20
> >>>> [    2.801038] usb 1-1: New USB device strings: Mfr=0, Product=1,
> SerialNumber=0
> >>>> [    2.808297] usb 1-1: Product: USB2.0 Hub
> >>>> [    2.813710] hub 1-1:1.0: USB hub found
> >>>> [    2.817117] hub 1-1:1.0: 4 ports detected
> >>>>
> >>>> This is bailing out right here:
> >>>>
> >>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
> >>>>
> it.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fstable%2Flinux.g
> >>>>
> it%2Ftree%2Fdrivers%2Ffirmware%2Fraspberrypi.c%3Fh%3Dv5.6.1%23n125
> &
> >>>>
> amp;data=02%7C01%7Cpeng.fan%40nxp.com%7Cab98b2db94484141a8ad0
> 8d7f18
> >>>>
> 03372%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6372434052
> 335723
> >>>>
> 54&amp;sdata=h1dyHkb%2FsifUqH3Z0m3uIIcqUhXVwhHS%2Ft%2FVvig%2Fo
> u4%3D
> >>>> &amp;reserved=0
> >>>>
> >>>> FYIW (since I modified the source to actually print what was
> >>>> returned right before it bails) we get:
> >>>>     buf[1] == 0x800000004
> >>>>     buf[2] == 0x00000001
> >>>>
> >>>> Status 0x800000004 is of course suspicious since it is not even listed
> here:
> >>>>
> >>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
> >>>>
> it.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fstable%2Flinux.g
> >>>>
> it%2Ftree%2Finclude%2Fsoc%2Fbcm2835%2Fraspberrypi-firmware.h%23n14
> &
> >>>>
> amp;data=02%7C01%7Cpeng.fan%40nxp.com%7Cab98b2db94484141a8ad0
> 8d7f18
> >>>>
> 03372%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6372434052
> 335723
> >>>>
> 54&amp;sdata=3yMm%2FujHCVf%2FigpNLE8hcBWVcvdGrZGv5TuqeMzMV0
> U%3D&amp
> >>>> ;reserved=0
> >>>>
> >>>> So it appears that this DMA request path is somehow busted and it
> >>>> would be really nice to figure out why.
> >>>
> >>> You have actually discovered a genuine bug in the recent xen dma
> >>> rework in Linux. Congrats :-)
> >>
> >> Nice! ;-)
> >>
> >>> I am doing some guesswork here, but from what I read in the thread
> >>> and the information in this email I think this patch might fix the issue.
> >>> If it doesn't fix the issue please add a few printks in
> >>> drivers/xen/swiotlb-xen.c:xen_swiotlb_free_coherent and please let
> >>> me know where exactly it crashes.
> >>>
> >>>
> >>> diff --git a/include/xen/arm/page-coherent.h
> >>> b/include/xen/arm/page-coherent.h index b9cc11e887ed..ff4677ed9788
> >>> 100644
> >>> --- a/include/xen/arm/page-coherent.h
> >>> +++ b/include/xen/arm/page-coherent.h
> >>> @@ -8,12 +8,17 @@
> >>>   static inline void *xen_alloc_coherent_pages(struct device *hwdev,
> size_t size,
> >>>                  dma_addr_t *dma_handle, gfp_t flags, unsigned
> long attrs)
> >>>   {
> >>> +       void *cpu_addr;
> >>> +       if (dma_alloc_from_dev_coherent(hwdev, size, dma_handle,
> &cpu_addr))
> >>> +               return cpu_addr;
> >>>          return dma_direct_alloc(hwdev, size, dma_handle, flags,
> attrs);
> >>>   }
> >>>
> >>>   static inline void xen_free_coherent_pages(struct device *hwdev,
> size_t size,
> >>>                  void *cpu_addr, dma_addr_t dma_handle,
> unsigned long attrs)
> >>>   {
> >>> +       if (dma_release_from_dev_coherent(hwdev, get_order(size),
> cpu_addr))
> >>> +               return;
> >>>          dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
> >>>   }
> >>
> >> Applied the patch, but it didn't help and after printk's it turns out
> >> it surprisingly crashes right inside this (rather convoluted if you
> >> ask me) if statement:
> >>
> >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> >> .kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fstable%2Flinux.gi
> t%2
> >>
> Ftree%2Fdrivers%2Fxen%2Fswiotlb-xen.c%3Fh%3Dv5.6.1%23n349&amp;dat
> a=02
> >> %7C01%7Cpeng.fan%40nxp.com%7Cab98b2db94484141a8ad08d7f18033
> 72%7C686ea
> >>
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637243405233572354&amp;
> sdata=Fu
> >>
> BWGAEg%2FkbsnYIIGHmiICTqy%2BcgZs7V%2BMneJum%2BTts%3D&amp;res
> erved=0
> >>
> >> So it makes sense that the patch didn't help -- we never hit that
> >> xen_free_coherent_pages.
> >
> > The crash happens here:
> >
> >     if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> >                  range_straddles_page_boundary(phys, size)) &&
> >         TestClearPageXenRemapped(virt_to_page(vaddr)))
> >             xen_destroy_contiguous_region(phys, order);
> >
> > I don't know exactly what is causing the crash. Is it the WARN_ON
> somehow?
> > Is it TestClearPageXenRemapped? Neither should cause a crash in theory.
> >
> >
> > But I do know that there are problems with that if statement on ARM.
> > It can trigger for one of the following conditions:
> >
> > 1) dev_addr + size - 1 > dma_mask
> > 2) range_straddles_page_boundary(phys, size)


diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index bd3a10dfac15..33b027cb0b2a 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -346,6 +346,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
        /* Convert the size to actually allocated. */
        size = 1UL << (order + XEN_PAGE_SHIFT);

+       printk("%x %x %px %x %px\n", dev_addr + size - 1 > dma_mask, 
range_straddles_page_boundary(phys, size), virt_to_page(vaddr), phys, vaddr);
        if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
                     range_straddles_page_boundary(phys, size)) &&
            TestClearPageXenRemapped(virt_to_page(vaddr)))


In my case:
0 0 0000000000271f40 bc000000 ffff800011c7d000


The alloc path in my side:
314         phys = *dma_handle;
315         dev_addr = xen_phys_to_bus(phys);
316         if (((dev_addr + size - 1 <= dma_mask)) &&
317             !range_straddles_page_boundary(phys, size))
318                 *dma_handle = dev_addr;

So I am confused why the free path needs clear xen remap.

Thanks,
Peng.

> >
> >
> > The first condition might happen after bef4d2037d214 because
> > dma_direct_alloc might not respect the device dma_mask. It is actually
> > a bug and I would like to keep the WARN_ON for that. The patch I sent
> > yesterday
> >
> (https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarc.
> info%2F%3Fl%3Dxen-devel%26m%3D158865080224504&amp;data=02%7C0
> 1%7Cpeng.fan%40nxp.com%7Cab98b2db94484141a8ad08d7f1803372%7C68
> 6ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637243405233572354&a
> mp;sdata=5wCGhi6HuXVegPvlu7gmbwv9yEZ5XQKbuqqPw5Zl8ss%3D&amp;re
> served=0) should solve that issue. But Roman is telling us that the crash 
> still
> persists.
> >
> > The second condition is completely normal and not an error on ARM
> > because dom0 is 1:1 mapped. It is not an issue if the address range is
> > straddling a page boundary. We certainly shouldn't WARN (or crash).
> >
> > So, I suggest something similar to Peng's patch, appended.
> >
> > Roman, does it solve your problem?
> >
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b6d27762c6f8..994ca3a4b653 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -346,9 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,
> size_t size, void *vaddr,
> >     /* Convert the size to actually allocated. */
> >     size = 1UL << (order + XEN_PAGE_SHIFT);
> >
> > -   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> > -                range_straddles_page_boundary(phys, size)) &&
> > -       TestClearPageXenRemapped(virt_to_page(vaddr)))
> > +   WARN_ON(dev_addr + size - 1 > dma_mask);
> > +   if (TestClearPageXenRemapped(virt_to_page(vaddr)))
> >             xen_destroy_contiguous_region(phys, order);
> 
> This is a very bad idea for x86. Calling xen_destroy_contiguous_region() for
> cases where it shouldn't be called is causing subtle bugs in memory
> management, like pages being visible twice and thus resulting in weird
> memory inconsistencies.
> 
> What you want here is probably an architecture specific test.
> 
> 
> Juergen

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.