[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: NULL pointer dereference in xenbus_thread->...
On Tue, Apr 29, 2025 at 08:59:45PM -0400, Jason Andryuk wrote: > Hi Marek, > > On Wed, Apr 23, 2025 at 8:42 AM Marek Marczykowski-Górecki > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Sat, Jun 01, 2024 at 12:48:33AM +0200, Marek Marczykowski-Górecki wrote: > > > On Tue, Mar 26, 2024 at 11:00:50AM +0000, Julien Grall wrote: > > > > Hi Marek, > > > > > > > > +Juergen for visibility > > > > > > > > When sending a bug report, I would suggest to CC relevant people as > > > > otherwise it can get lost (not may people monitors Xen devel if they > > > > are not > > > > CCed). > > > > > > > > Cheers, > > > > > > > > On 25/03/2024 16:17, Marek Marczykowski-Górecki wrote: > > > > > On Sun, Oct 22, 2023 at 04:14:30PM +0200, Marek Marczykowski-Górecki > > > > > wrote: > > > > > > On Mon, Aug 28, 2023 at 11:50:36PM +0200, Marek > > > > > > Marczykowski-Górecki wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've noticed in Qubes's CI failure like this: > > > > > > > > > > > > > > [ 871.271292] BUG: kernel NULL pointer dereference, address: > > > > > > > 0000000000000000 > > > > > > > [ 871.275290] #PF: supervisor read access in kernel mode > > > > > > > [ 871.277282] #PF: error_code(0x0000) - not-present page > > > > > > > [ 871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0 > > > > > > > [ 871.281071] Oops: 0000 [#1] PREEMPT SMP NOPTI > > > > > > > [ 871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted > > > > > > > 6.1.43-1.qubes.fc37.x86_64 #1 > > > > > > > [ 871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, > > > > > > > 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 > > > > > > > [ 871.288883] RIP: e030:__wake_up_common+0x4c/0x180 > > > > > > > [ 871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 > > > > > > > 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 > > > > > > > 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 > > > > > > > 04 75 5f 49 8b 40 > > > > > > > [ 871.299776] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082 > > > > > > > [ 871.301656] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: > > > > > > > 0000000000000000 > > > > > > > [ 871.304255] RDX: 0000000000000001 RSI: 0000000000000003 RDI: > > > > > > > ffff88810541ce90 > > > > > > > [ 871.306714] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: > > > > > > > ffffc900400f7e68 > > > > > > > [ 871.309937] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: > > > > > > > ffffc900400f7e68 > > > > > > > [ 871.312326] R13: 0000000000000000 R14: 0000000000000000 R15: > > > > > > > 0000000000000000 > > > > > > > [ 871.314647] FS: 0000000000000000(0000) > > > > > > > GS:ffff88813ff00000(0000) knlGS:0000000000000000 > > > > > > > [ 871.317677] CS: 10000e030 DS: 0000 ES: 0000 CR0: > > > > > > > 0000000080050033 > > > > > > > [ 871.319644] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: > > > > > > > 0000000000040660 > > > > > > > [ 871.321973] Call Trace: > > > > > > > [ 871.322782] <TASK> > > > > > > > [ 871.323494] ? show_trace_log_lvl+0x1d3/0x2ef > > > > > > > [ 871.324901] ? show_trace_log_lvl+0x1d3/0x2ef > > > > > > > [ 871.326310] ? show_trace_log_lvl+0x1d3/0x2ef > > > > > > > [ 871.327721] ? __wake_up_common_lock+0x82/0xd0 > > > > > > > [ 871.329147] ? __die_body.cold+0x8/0xd > > > > > > > [ 871.330378] ? page_fault_oops+0x163/0x1a0 > > > > > > > [ 871.331691] ? exc_page_fault+0x70/0x170 > > > > > > > [ 871.332946] ? asm_exc_page_fault+0x22/0x30 > > > > > > > [ 871.334454] ? __wake_up_common+0x4c/0x180 > > > > > > > [ 871.335777] __wake_up_common_lock+0x82/0xd0 > > > > > > > [ 871.337183] ? process_writes+0x240/0x240 > > > > > > > [ 871.338461] process_msg+0x18e/0x2f0 > > > > > > > [ 871.339627] xenbus_thread+0x165/0x1c0 > > > > > > > [ 871.340830] ? cpuusage_read+0x10/0x10 > > > > > > > [ 871.342032] kthread+0xe9/0x110 > > > > > > > [ 871.343317] ? kthread_complete_and_exit+0x20/0x20 > > > > > > > [ 871.345020] ret_from_fork+0x22/0x30 > > > > > > > [ 871.346239] </TASK> > > > > > > > [ 871.347060] Modules linked in: snd_hda_codec_generic > > > > > > > ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > > > > > > > snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device > > > > > > > joydev snd_pcm intel_rapl_msr ppdev intel_rapl_common snd_timer > > > > > > > pcspkr e1000e snd soundcore i2c_piix4 parport_pc parport loop > > > > > > > fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel > > > > > > > polyval_clmulni polyval_generic floppy ghash_clmulni_intel > > > > > > > sha512_ssse3 serio_raw virtio_scsi virtio_console bochs xhci_pci > > > > > > > xhci_pci_renesas xhci_hcd qemu_fw_cfg drm_vram_helper > > > > > > > drm_ttm_helper ttm ata_generic pata_acpi xen_privcmd xen_pciback > > > > > > > xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac > > > > > > > scsi_dh_emc scsi_dh_alua uinput dm_multipath > > > > > > > [ 871.368892] CR2: 0000000000000000 > > > > > > > [ 871.370160] ---[ end trace 0000000000000000 ]--- > > > > > > > [ 871.371719] RIP: e030:__wake_up_common+0x4c/0x180 > > > > > > > [ 871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 > > > > > > > 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 > > > > > > > 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 > > > > > > > 04 75 5f 49 8b 40 > > > > > > > [ 871.379866] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082 > > > > > > > [ 871.381689] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: > > > > > > > 0000000000000000 > > > > > > > [ 871.383971] RDX: 0000000000000001 RSI: 0000000000000003 RDI: > > > > > > > ffff88810541ce90 > > > > > > > [ 871.386235] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: > > > > > > > ffffc900400f7e68 > > > > > > > [ 871.388521] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: > > > > > > > ffffc900400f7e68 > > > > > > > [ 871.390789] R13: 0000000000000000 R14: 0000000000000000 R15: > > > > > > > 0000000000000000 > > > > > > > [ 871.393101] FS: 0000000000000000(0000) > > > > > > > GS:ffff88813ff00000(0000) knlGS:0000000000000000 > > > > > > > [ 871.395671] CS: 10000e030 DS: 0000 ES: 0000 CR0: > > > > > > > 0000000080050033 > > > > > > > [ 871.397863] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: > > > > > > > 0000000000040660 > > > > > > > [ 871.400441] Kernel panic - not syncing: Fatal exception > > > > > > > [ 871.402171] Kernel Offset: disabled > > > > > > > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. > > > > > > > > > > > > > > It isn't the first time I see similar crash, but I can't really > > > > > > > reproduce it reliably. Restarted test usually passes. > > > > > > > Note this is Xen nested in KVM, so it could very well be some > > > > > > > oddity > > > > > > > about nested virt, although looking at the stack trace, it's > > > > > > > unlikely > > > > > > > and more likely some race condition hit only on slower system. > > > > > > > > > > > > Recently I've got the same crash on a real system in domU too. And > > > > > > also > > > > > > on nested on newer kernel 6.1.57 (here it happened in dom0). So, > > > > > > this is > > > > > > still an issue and affects not only nested case :/ > > > > > > > > > > > > > Unfortunately I don't have symbols for this kernel handy, but > > > > > > > there is a > > > > > > > single wake_up() call in process_writes(), so it shouldn't be an > > > > > > > issue. > > > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > > > Full log at > > > > > > > https://openqa.qubes-os.org/tests/80779/logfile?filename=serial0.txt > > > > > > > > > > > > More links at https://github.com/QubesOS/qubes-issues/issues/8638, > > > > > > including more recent stack trace. > > > > > > > > > > Happens on 6.1.75 too (new stack trace I've added to the issue above, > > > > > but it's pretty similar). > > > > > > Recently I've got a report from another user about similar issue, on > > > 6.6.29 this time. I also still encounter this issue once a month or so, > > > but the user claims they get it much more often: > > > https://github.com/QubesOS/qubes-issues/issues/8638#issuecomment-2135419896 > > > The extra conditions reported by the user are: > > > - old AMD system (KGPE-D16 with Opteron 6282 SE) requiring > > > `spec-ctrl=ibpb-entry=no-pv` to remain usable > > > - Whonix domU, which has a bunch of sysctl parameters changed, listed > > > at: > > > - https://github.com/Kicksecure/security-misc > > > - > > > https://github.com/Kicksecure/security-misc/blob/master/usr/lib/sysctl.d/990-security-misc.conf > > > (unsure which are relevant, maybe `vm.swappiness=1`?) > > > > I've got some more report confirming it's still happening on Linux > > 6.12.18. Is there anything I can do to help fixing this? Maybe ask users > > to enable some extra logging? > > Have you been able to capture a crash with debug symbols and run it > through scripts/decode_stacktrace.sh? This worked: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__wake_up_common (kernel/sched/wait.c:85) Code: 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 c3 08 48 83 ec 08 48 8b 47 08 89 54 24 04 48 39 c3 74 55 4d 89 c7 <4c> 8b 00 41 89 f5 41 89 ce 48 8d 78 e8 4d 8d 60 e8 eb 27 74 0c 83 All code ======== 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 41 57 push %r15 7: 41 56 push %r14 9: 41 55 push %r13 b: 41 54 push %r12 d: 55 push %rbp e: 53 push %rbx f: 48 89 fb mov %rdi,%rbx 12: 48 83 c3 08 add $0x8,%rbx 16: 48 83 ec 08 sub $0x8,%rsp 1a: 48 8b 47 08 mov 0x8(%rdi),%rax 1e: 89 54 24 04 mov %edx,0x4(%rsp) 22: 48 39 c3 cmp %rax,%rbx 25: 74 55 je 0x7c 27: 4d 89 c7 mov %r8,%r15 2a:* 4c 8b 00 mov (%rax),%r8 <-- trapping instruction 2d: 41 89 f5 mov %esi,%r13d 30: 41 89 ce mov %ecx,%r14d 33: 48 8d 78 e8 lea -0x18(%rax),%rdi 37: 4d 8d 60 e8 lea -0x18(%r8),%r12 3b: eb 27 jmp 0x64 3d: 74 0c je 0x4b 3f: 83 .byte 0x83 Code starting with the faulting instruction =========================================== 0: 4c 8b 00 mov (%rax),%r8 3: 41 89 f5 mov %esi,%r13d 6: 41 89 ce mov %ecx,%r14d 9: 48 8d 78 e8 lea -0x18(%rax),%rdi d: 4d 8d 60 e8 lea -0x18(%r8),%r12 11: eb 27 jmp 0x3a 13: 74 0c je 0x21 15: 83 .byte 0x83 RSP: 0018:ffffc900009f7e40 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffff8880109c0798 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8880109c0790 RBP: ffff8880109c0790 R08: 0000000000000000 R09: 0000000000000003 R10: ffffc900009f7eb0 R11: ffffc9000003d000 R12: 0000000000000003 R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff888018500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000432a0000 CR4: 00000000000406f0 Call Trace: <TASK> ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:715) ? exc_page_fault (arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623) ? __wake_up_common (kernel/sched/wait.c:85) ? __pfx_xenbus_thread (drivers/xen/xenbus/xenbus_comms.c:411) __wake_up (include/linux/spinlock.h:406 kernel/sched/wait.c:108 kernel/sched/wait.c:127) process_msg (drivers/xen/xenbus/xenbus_comms.c:311) xenbus_thread (drivers/xen/xenbus/xenbus_comms.c:418) ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) kthread (kernel/kthread.c:389) ? __pfx_kthread (kernel/kthread.c:342) ret_from_fork (arch/x86/kernel/process.c:153) ? __pfx_kthread (kernel/kthread.c:342) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore xenfs nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink binfmt_misc crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 xen_netfront sha1_ssse3 xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn loop fuse ip_tables overlay xen_blkfront CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- > I'm curious what process_msg+0x18e/0x2f0 is. process_writes() has a > direct call to wake_up(), but process_msg() calling req->cb(req) may > be xs_wake_up() which is a thin wrapper over wake_up(). So, it's req->cb(req). > They make me wonder if req has been free()ed and at least partially > zero-ed, but it still has wake_up() called. The call stack here is > reminiscent of the one here > https://lore.kernel.org/xen-devel/Z_lJTyVipJJEpWg2@mail-itl/ and the > unexpected value there is 0. > > I haven't actually figured out a scenario where req would be kfree()ed > early. But the handling of kfree(req) being split between > process_msg/writes() and xs_wait_for_reply() makes me a little uneasy. > > Regards, > Jason -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |