[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

To: Will Deacon <will@xxxxxxxxxx>
From: Nathan Chancellor <nathan@xxxxxxxxxx>
Date: Wed, 30 Jun 2021 08:56:51 -0700
Cc: Claire Chang <tientzu@xxxxxxxxxxxx>, Rob Herring <robh+dt@xxxxxxxxxx>, mpe@xxxxxxxxxxxxxx, Joerg Roedel <joro@xxxxxxxxxx>, Frank Rowand <frowand.list@xxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, boris.ostrovsky@xxxxxxxxxx, jgross@xxxxxxxx, Christoph Hellwig <hch@xxxxxx>, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>, benh@xxxxxxxxxxxxxxxxxxx, paulus@xxxxxxxxx, "list@xxxxxxx:IOMMU DRIVERS" <iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Robin Murphy <robin.murphy@xxxxxxx>, grant.likely@xxxxxxx, xypron.glpk@xxxxxx, Thierry Reding <treding@xxxxxxxxxx>, mingo@xxxxxxxxxx, bauerman@xxxxxxxxxxxxx, peterz@xxxxxxxxxxxxx, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>, Saravana Kannan <saravanak@xxxxxxxxxx>, "Rafael J . Wysocki" <rafael.j.wysocki@xxxxxxxxx>, heikki.krogerus@xxxxxxxxxxxxxxx, Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>, Randy Dunlap <rdunlap@xxxxxxxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, Bartosz Golaszewski <bgolaszewski@xxxxxxxxxxxx>, linux-devicetree <devicetree@xxxxxxxxxxxxxxx>, lkml <linux-kernel@xxxxxxxxxxxxxxx>, linuxppc-dev@xxxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Nicolas Boichat <drinkcat@xxxxxxxxxxxx>, Jim Quinlan <james.quinlan@xxxxxxxxxxxx>, Tomasz Figa <tfiga@xxxxxxxxxxxx>, bskeggs@xxxxxxxxxx, Bjorn Helgaas <bhelgaas@xxxxxxxxxx>, chris@xxxxxxxxxxxxxxxxxx, Daniel Vetter <daniel@xxxxxxxx>, airlied@xxxxxxxx, dri-devel@xxxxxxxxxxxxxxxxxxxxx, intel-gfx@xxxxxxxxxxxxxxxxxxxxx, jani.nikula@xxxxxxxxxxxxxxx, Jianxiong Gao <jxgao@xxxxxxxxxx>, joonas.lahtinen@xxxxxxxxxxxxxxx, linux-pci@xxxxxxxxxxxxxxx, maarten.lankhorst@xxxxxxxxxxxxxxx, matthew.auld@xxxxxxxxx, rodrigo.vivi@xxxxxxxxx, thomas.hellstrom@xxxxxxxxxxxxxxx, Tom Lendacky <thomas.lendacky@xxxxxxx>, Qian Cai <quic_qiancai@xxxxxxxxxxx>
Delivery-date: Wed, 30 Jun 2021 15:57:05 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Will and Claire,

On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor <nathan@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > > > use it to determine whether to bounce the data or not. This will be
> > > > useful later to allow for different pools.
> > > >
> > > > Signed-off-by: Claire Chang <tientzu@xxxxxxxxxxxx>
> > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> > > > Tested-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > > > Tested-by: Will Deacon <will@xxxxxxxxxx>
> > > > Acked-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > >
> > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> > > get to an X session consistently (although not every single time),
> > > presumably due to a crash in the AMDGPU driver that I see in dmesg.
> > >
> > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> > > to provide any further information, debug, or test patches as necessary.
> > 
> > Are you using swiotlb=force? or the swiotlb_map is called because of
> > !dma_capable? 
> > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)
> 
> The command line is in the dmesg:
> 
>   | Kernel command line: initrd=\amd-ucode.img 
> initrd=\initramfs-linux-next-llvm.img 
> root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp 
> irqpoll
> 
> but I worry that this looks _very_ similar to the issue reported by Qian
> Cai which we thought we had fixed. Nathan -- is the failure deterministic?

Yes, for the most part. It does not happen every single boot so when I
was bisecting, I did a series of seven boots and only considered the
revision good when all seven of them made it to LightDM's greeter. My
results that I notated show most bad revisions failed anywhere from four
to six times.

> > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > (maybe dev->dma_io_tlb_mem) was corrupted?
> > The dev->dma_io_tlb_mem should be set here
> > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > through device_initialize.
> 
> I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> The spinlock is at offset 0x24 in that structure, and looking at the
> register dump from the crash:
> 
> Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 
> RCX: ffff8900572ad580
> Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 
> RDI: 0000000000001d17
> Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 
> R09: 0000000000000000
> Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 
> R12: 0000000000000212
> Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 
> R15: 0000000000200000
> Jun 29 18:28:42 hp-4300G kernel: FS:  00007f1f8898ea40(0000) 
> GS:ffff890057280000(0000) knlGS:0000000000000000
> Jun 29 18:28:42 hp-4300G kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 
> CR4: 0000000000350ee0
> Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> Jun 29 18:28:42 hp-4300G kernel:  _raw_spin_lock_irqsave+0x39/0x50
> Jun 29 18:28:42 hp-4300G kernel:  swiotlb_tbl_map_single+0x12b/0x4c0
> 
> Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> RDX pointing at the spinlock. Yet RAX is holding junk :/
> 
> I agree that enabling KASAN would be a good idea, but I also think we
> probably need to get some more information out of swiotlb_tbl_map_single()
> to see see what exactly is going wrong in there.

I can certainly enable KASAN and if there is any debug print I can add
or dump anything, let me know!

Cheers,
Nathan

References:
- [PATCH v15 00/12] Restricted DMA
  - From: Claire Chang
- [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  - From: Claire Chang
- Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  - From: Nathan Chancellor
- Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  - From: Claire Chang
- Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  - From: Will Deacon

Prev by Date: [xen-unstable test] 163190: tolerable FAIL - PUSHED
Next by Date: Re: [PATCH 2/9] xen/arm: introduce PGC_reserved
Previous by thread: Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Next by thread: [PATCH v15 07/12] swiotlb: Move alloc_size to swiotlb_find_slots
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.