[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andy Smith <andy@xxxxxxxxxxxxxx>
  • Date: Sun, 25 Jul 2021 14:06:11 +0000
  • Delivery-date: Sun, 25 Jul 2021 14:06:21 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc

Hello,

On Tue, Jul 20, 2021 at 10:32:39PM +0000, Andy Smith wrote:
> I have a Debian 10 (buster/stable) dom0 running hypervisor 4.14.2.
> For almost 2 years it's been using the packaged Debian stable kernel
> which is 4.19.x.
> 
> Last night I upgraded the kernel to the buster-backports package
> which is based on 5.10.40 and about 4 hours later got this:
> 
> Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] 
> phys_addr:0x00000015eb803000 offset:0 length:4096 
> dma_address:0x000000209e7b7000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] 
> phys_addr:0x00000015eb7bc000 offset:0 length:4096 
> dma_address:0x000000209e7b8000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] 
> phys_addr:0x00000015eb809000 offset:0 length:4096 
> dma_address:0x000000209e7b9000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] 
> phys_addr:0x00000015eb766000 offset:0 length:4096 
> dma_address:0x000000209e7ba000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] 
> phys_addr:0x00000015eb7a3000 offset:0 length:4096 
> dma_address:0x000000209e7bb000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] 
> phys_addr:0x00000015eb7c6000 offset:0 length:4096 
> dma_address:0x000000209e7bc000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] 
> phys_addr:0x00000015eb7c2000 offset:0 length:4096 
> dma_address:0x000000209e7bd000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] 
> phys_addr:0x00000015eb7a9000 offset:0 length:4096 
> dma_address:0x000000209e7be000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] 
> phys_addr:0x00000015eb775000 offset:0 length:4096 
> dma_address:0x000000209e7bf000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] 
> phys_addr:0x00000015eb7c7000 offset:0 length:4096 
> dma_address:0x00000020ff520000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] 
> phys_addr:0x00000015eb7cb000 offset:0 length:4096 
> dma_address:0x00000020ff521000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] 
> phys_addr:0x00000015eb7e3000 offset:0 length:61952 
> dma_address:0x00000020ff522000 dma_length:61952
> Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] 
> phys_addr:0x00000015eb7f2200 offset:512 length:24064 
> dma_address:0x00000020ff531200 dma_length:24064
> Jul 20 02:17:54 lamb kernel: [21061.401781] ------------[ cut here 
> ]------------
> Jul 20 02:17:54 lamb kernel: [21061.402738] Invalid SGL for payload:131072 
> nents:13
> Jul 20 02:17:54 lamb kernel: [21061.403724] WARNING: CPU: 1 PID: 12669 at 
> drivers/nvme/host/pci.c:716 nvme_map_data+0x7e0/0x820 [nvme]
> Jul 20 02:17:54 lamb kernel: [21061.404728] Modules linked in: binfmt_misc 
> ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpmss nf_log_ipv6 
> nf_log_ipv4 nf_log_common xt_LOG xt_limit nfnetlink_log nfnetlink xt_NFLOG 
> xt_multiport xt_tcpudp ip6table_filter ip6_tables iptable_filter bonding 
> btrfs blake2b_generic dm_snapshot dm_bufio intel_rapl_msr intel_rapl_common 
> skx_edac nfit libnvdimm intel_powerclamp crc32_pclmul ghash_clmulni_intel 
> ipmi_ssif aesni_intel libaes crypto_simd cryptd glue_helper snd_hda_intel 
> snd_intel_dspcfg mei_wdt soundwire_intel soundwire_generic_allocation nvme 
> wdat_wdt snd_soc_core ast snd_compress watchdog drm_vram_helper 
> drm_ttm_helper soundwire_cadence pcspkr nvme_core ttm snd_hda_codec 
> drm_kms_helper snd_hda_core i2c_i801 snd_hwdep i2c_smbus cec soundwire_bus 
> snd_pcm drm snd_timer snd soundcore igb ptp pps_core i2c_algo_bit joydev 
> mei_me sg mei intel_lpss_pci intel_lpss idma64 acpi_ipmi ipmi_si ipmi_devintf 
> ioatdma dca wmi ipmi_msghandler button dm_mod xenfs xen_acpi_processor
> Jul 20 02:17:54 lamb kernel: [21061.404831]  xen_privcmd xen_pciback 
> xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn ip_tables x_tables 
> autofs4 ext4 crc16 mbcache jbd2 raid456 libcrc32c crc32c_generic 
> async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev 
> hid_generic usbhid hid raid6_pq raid0 multipath linear raid10 raid1 md_mod 
> sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common 
> xhci_pci ahci libahci crc32c_intel xhci_hcd libata usbcore scsi_mod usb_common
> Jul 20 02:17:54 lamb kernel: [21061.417998] CPU: 1 PID: 12669 Comm: 62.xvda-0 
> Not tainted 5.10.0-0.bpo.7-amd64 #1 Debian 5.10.40-1~bpo10+1
> Jul 20 02:17:54 lamb kernel: [21061.418459] Hardware name: Supermicro Super 
> Server/X11SRM-VF, BIOS 1.2a 02/18/2019
> Jul 20 02:17:54 lamb kernel: [21061.418922] RIP: 
> e030:nvme_map_data+0x7e0/0x820 [nvme]
> Jul 20 02:17:54 lamb kernel: [21061.419354] Code: d0 7b c0 48 c7 c7 40 d6 7b 
> c0 e8 5b 44 c9 c0 8b 93 4c 01 00 00 f6 43 1e 04 75 36 8b 73 28 48 c7 c7 20 9c 
> 7b c0 e8 8b 71 09 c1 <0f> 0b 41 bd 0a 00 00 00 e9 f7 fe ff ff 48 8d bd 68 02 
> 00 00 48 89
> Jul 20 02:17:54 lamb kernel: [21061.420271] RSP: e02b:ffffc90044797930 
> EFLAGS: 00010286
> Jul 20 02:17:54 lamb kernel: [21061.420727] RAX: 0000000000000000 RBX: 
> ffff888157db4200 RCX: 0000000000000027
> Jul 20 02:17:54 lamb kernel: [21061.421186] RDX: 0000000000000027 RSI: 
> ffff888292858a00 RDI: ffff888292858a08
> Jul 20 02:17:54 lamb kernel: [21061.421639] RBP: ffff888103243000 R08: 
> 0000000000000000 R09: c00000010000118b
> Jul 20 02:17:54 lamb kernel: [21061.422090] R10: 0000000000165920 R11: 
> ffffc90044797738 R12: ffffffffc07b9bd0
> Jul 20 02:17:54 lamb kernel: [21061.422583] R13: 000000000000000d R14: 
> 0000000000000000 R15: 000000000000000d
> Jul 20 02:17:54 lamb kernel: [21061.423052] FS:  0000000000000000(0000) 
> GS:ffff888292840000(0000) knlGS:0000000000000000
> Jul 20 02:17:54 lamb kernel: [21061.423518] CS:  e030 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> Jul 20 02:17:54 lamb kernel: [21061.423986] CR2: 00007f909a037c30 CR3: 
> 000000010d2dc000 CR4: 0000000000050660
> Jul 20 02:17:54 lamb kernel: [21061.424472] Call Trace:
> Jul 20 02:17:54 lamb kernel: [21061.424943]  nvme_queue_rq+0x98/0x190 [nvme]
> Jul 20 02:17:54 lamb kernel: [21061.425425]  
> blk_mq_dispatch_rq_list+0x123/0x7d0
> Jul 20 02:17:54 lamb kernel: [21061.425904]  ? sbitmap_get+0x66/0x140
> Jul 20 02:17:54 lamb kernel: [21061.426385]  ? elv_rb_del+0x1f/0x30
> Jul 20 02:17:54 lamb kernel: [21061.426909]  ? 
> deadline_remove_request+0x55/0xc0
> Jul 20 02:17:54 lamb kernel: [21061.427373]  
> __blk_mq_do_dispatch_sched+0x164/0x2d0
> Jul 20 02:17:54 lamb kernel: [21061.427843]  
> __blk_mq_sched_dispatch_requests+0x135/0x170
> Jul 20 02:17:54 lamb kernel: [21061.428310]  
> blk_mq_sched_dispatch_requests+0x30/0x60
> Jul 20 02:17:54 lamb kernel: [21061.428795]  __blk_mq_run_hw_queue+0x51/0xd0
> Jul 20 02:17:54 lamb kernel: [21061.429269]  
> __blk_mq_delay_run_hw_queue+0x141/0x160
> Jul 20 02:17:54 lamb kernel: [21061.429752]  
> blk_mq_sched_insert_requests+0x6a/0xf0
> Jul 20 02:17:54 lamb kernel: [21061.430233]  
> blk_mq_flush_plug_list+0x119/0x1b0
> Jul 20 02:17:54 lamb kernel: [21061.430756]  blk_flush_plug_list+0xd7/0x100
> Jul 20 02:17:54 lamb kernel: [21061.431241]  blk_finish_plug+0x21/0x30
> Jul 20 02:17:54 lamb kernel: [21061.431734]  dispatch_rw_block_io+0x6a5/0x9a0 
> [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.432220]  __do_block_io_op+0x31d/0x620 
> [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.432714]  ? 
> _raw_spin_unlock_irqrestore+0x14/0x20
> Jul 20 02:17:54 lamb kernel: [21061.433193]  ? try_to_del_timer_sync+0x4d/0x80
> Jul 20 02:17:54 lamb kernel: [21061.433680]  xen_blkif_schedule+0xda/0x670 
> [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.434160]  ? __schedule+0x2c6/0x770
> Jul 20 02:17:54 lamb kernel: [21061.434679]  ? finish_wait+0x80/0x80
> Jul 20 02:17:54 lamb kernel: [21061.435129]  ? xen_blkif_be_int+0x30/0x30 
> [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.435571]  kthread+0x116/0x130
> Jul 20 02:17:54 lamb kernel: [21061.436002]  ? kthread_park+0x80/0x80
> Jul 20 02:17:54 lamb kernel: [21061.436422]  ret_from_fork+0x22/0x30
> Jul 20 02:17:54 lamb kernel: [21061.436846] ---[ end trace 1d90be7aea2d9148 
> ]---
> Jul 20 02:17:54 lamb kernel: [21061.437250] blk_update_request: I/O error, 
> dev nvme0n1, sector 912000815 op 0x1:(WRITE) flags 0x800 phys_seg 13 prio 
> class 0
> Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Disk failure on 
> nvme0n1, disabling device.
> Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Operation 
> continuing on 1 devices.

I've tested and can confirm that this issue is fixed by:

    https://lore.kernel.org/patchwork/patch/1442338/

which is in 5.14 already and was also backported to 5.10.50.

Debian buster-backports currently ships with 5.10.40; bullseye is
5.10.46; Salvatore Bonaccorso said on debian-kernel list that they
would backport this fix to all the 5.10 packages.

Thanks,
Andy



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.