[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen-swiotlb issue when NVMe driver is enabled in Dom0 on ARM


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Rahul Singh <Rahul.Singh@xxxxxxx>
  • Date: Wed, 20 Apr 2022 11:05:02 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jE/UQrN3nLFi42/f5EoR1coYixBmnRP5DboB675wHas=; b=JUZG7IKVXY4owmuFvN5VaYoTrOKTNFbKQWLBvsqgeak63Kd3mDLrp8g+sqwWqNVDzyxV7qQoxUsQtWXPhTtk2gNMoYcBGmaAtec+w/z+jfzX9mNSkrs58UN6M7rQxGHy21FHp5+KEGuqaO7J5nB0VejJE9GfrW+MjxU9/6qNiit9TeBpy0HDGCGMSb6atelhgqhPbdUF+rxGOHq/PPHuSo4jrw7TNS12ecsLllN/UqR9/kfC/U7MEGjoPFh76xW6b02LM+GKJ50vaGaKdCRs3ULDNrInUXYpQ1E4k0ITfpmp8HVjFD0PkSh68PB11dsbUckIQwkwo7A21Iqeqv82gg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NtncwlQSONdbSvXMWKFZwaOt2B+tzhNcKf4MvTSIWwTH7yRauCvzEwH46d+TxI3jZbANzFJWxXg/yUczD2DbdF57U2l2bRUFyuUYVmg01wvu6rW2lRnhc6AlRXICF/KZj97gFF60BlwwaIvJRLQYd0c9SLazZTIBxsT9hJQEJL6NwGYw1LyL2TZMhzdoP9BSihGPa2Q/aFIgN4lLY9ThJY9BBdDIrYM/ij22w1zO8+5/weeBfdSOG2xq/0C8uYMUoGNiBW0C66bU51uztKUFCH9We7OIimAZr3xVG/tyr8MKSdV6P5lVuT+2CHg7gnKX2zjTb2Oy6xVoKmdyitW6Dw==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Christoph Hellwig <hch@xxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "jgross@xxxxxxxx" <jgross@xxxxxxxx>, "boris.ostrovsky@xxxxxxxxxx" <boris.ostrovsky@xxxxxxxxxx>
  • Delivery-date: Wed, 20 Apr 2022 11:07:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHYTzcUggEUWxM2006iot1qnYxlIKzuW0oAgAFU1ACAADDmgIAApv+AgAC5dQCAAp7ugIACQDIAgAEmAYCAANnhAIAAjguA
  • Thread-topic: xen-swiotlb issue when NVMe driver is enabled in Dom0 on ARM

Hi Stefano,

Thanks again for helping us to find the root cause of the issue.

> On 20 Apr 2022, at 3:36 am, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:
>
>>> Then there is xen_swiotlb_init() which allocates some memory for
>>> swiotlb-xen at boot. It could lower the total amount of memory
>>> available, but if you disabled swiotlb-xen like I suggested,
>>> xen_swiotlb_init() still should get called and executed anyway at boot
>>> (it is called from arch/arm/xen/mm.c:xen_mm_init). So xen_swiotlb_init()
>>> shouldn't be the one causing problems.
>>>
>>> That's it -- there is nothing else in swiotlb-xen that I can think of.
>>>
>>> I don't have any good ideas, so I would only suggest to add more printks
>>> and report the results, for instance:
>>
>> As suggested I added the more printks but only difference I see is the size apart
>> from that everything looks same .
>>
>> Please find the attached logs for xen and native linux boot.
>
> One difference is that the order of the allocations is significantly
> different after the first 3 allocations. It is very unlikely but
> possible that this is an unrelated concurrency bug that only occurs on
> Xen. I doubt it.

I am not sure but just to confirm with you, I see below logs in every scenario.
SWIOTLB memory allocated by linux swiotlb and used by xen-swiotlb. Is that okay or it can cause some issue.

[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0x00000000f4000000-0x00000000f8000000] (64MB)

snip from int __ref xen_swiotlb_init(int verbose, bool early)
/*                                                                         
     * IO TLB memory already allocated. Just use it.                           
     */                                                                        
    if (io_tlb_start != 0) {                                                   
        xen_io_tlb_start = phys_to_virt(io_tlb_start);                         
        goto end;                                                              
    }


>
> I think you could try booting native and Xen with only 1 CPU enabled in
> both cases.
>
> For native, you can do that with maxcpus, e.g. maxcpus=1.
> For Xen, you can do that with dom0_max_vcpus=1. I don't think we need to
> reduce the number of pCPUs seen by Xen, but it could be useful to pass
> sched=null to avoid any scheduler effects. This is just for debugging of
> course.
>

I tried to boot the XEN with "dom0_max_vcpus=1” & “schedule-null” and
issue remains .

>
> In reality, the most likely explanation is that the issue is a memory
> corruption. Something somewhere is corrupting Linux memory and it just
> happens that we see it when calling dma_direct_alloc. This means it is
> going to be difficult to find as the only real clue is that it is
> swiotlb-xen that is causing it.

Agree we observe issue with xen-swiotlb dma ops only.
>
>
> I added more printks with the goal of detecting swiotlb-xen code paths
> that shouldn't be taken in a normal dom0 boot without domUs. For
> instance, range_straddles_page_boundary should always return zero and
> the dma_mask check in xen_swiotlb_alloc_coherent should always succeed.
>
> Fingers crossed we'll notice that the wrong path is taken just before
> the crash.

Please find the attached logs.

I captured the logs for Xen with and without (dom0_max_vcpus=1 & sched=null) and
also for native linux with and without (maxcpus=1)


 
Regards,
Rahul

Attachment: xen_boot_with_dom0_max_vcpus_1_debug.log
Description: xen_boot_with_dom0_max_vcpus_1_debug.log

Attachment: native_linux_with_ maxcpus_1_debug.log
Description: native_linux_with_ maxcpus_1_debug.log

Attachment: native_linux_boot_debug.log
Description: native_linux_boot_debug.log

Attachment: xen_boot_debug.log
Description: xen_boot_debug.log


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.