Xen project Mailing List

Re: WARN in xennet_disconnect_backend when frontend is paused during backend shutdown

From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Fri, 12 Sep 2025 12:48:22 +0200

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 12 Sep 2025 10:56:20 +0000

Feedback-id: i1568416f:Fastmail

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Sep 12, 2025 at 11:49:12AM +0200, Jürgen Groß wrote: > On 11.09.25 17:11, Marek Marczykowski-Górecki wrote: > > Hi, > > > > The steps: > > 1. Have domU netfront ("untrusted" here) and domU netback > > ("sys-firewall-alt" here). > > 2. Pause frontend > > 3. Shutdown backend > > 4. Unpause frontend > > 5. Detach network (in my case attaching another one follows just after, > > but I believe it's not relevant). > > > > This gives the following on the frontend side: > > > > ------------[ cut here ]------------ > > WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328 > > xennet_disconnect_backend+0x1be/0x590 [xen_netfront] > > Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device > > snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 > > nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common > > intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery > > pmt_class intel_pmc_ssram_telemetry intel_vsec > > polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback > > target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback > > xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront > > CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted > > 6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full) > > RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront] > > Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1 > > 01 75 79 66 90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49 > > 81 ff 00 01 00 00 0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0 > > RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246 > > RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5 > > RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60 > > RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80 > > R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000 > > R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001 > > FS: 0000000000000000(0000) GS:ffff8880952c6000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0 > > PKRU: 55555554 > > Call Trace: > > <TASK> > > xennet_remove+0x1e/0x80 [xen_netfront] > > xenbus_dev_remove+0x6e/0xf0 > > device_release_driver_internal+0x19c/0x200 > > bus_remove_device+0xc6/0x130 > > device_del+0x160/0x3e0 > > ? _raw_spin_unlock+0xe/0x30 > > ? klist_iter_exit+0x18/0x30 > > ? __pfx_xenwatch_thread+0x10/0x10 > > device_unregister+0x17/0x60 > > xenbus_dev_changed+0x1d7/0x240 > > xenwatch_thread+0x8f/0x1c0 > > ? __pfx_autoremove_wake_function+0x10/0x10 > > kthread+0xf9/0x240 > > ? __pfx_kthread+0x10/0x10 > > ret_from_fork+0x152/0x180 > > ? __pfx_kthread+0x10/0x10 > > ret_from_fork_asm+0x1a/0x30 > > </TASK> > > ---[ end trace 0000000000000000 ]--- > > xen_netfront: backend supports XDP headroom > > vif vif-0: bouncing transmitted data to zeroed pages > > > > The last two are likely related to following attach, not detach. > > > > The same happens on 6.15 too, so it isn't new thing. > > > > Shutting down backend without detaching first is not really a normal > > operation, and doing that while frontend is paused is even less so. But > > is the above expected outcome? If I read it right, it's > > WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find > > confusing. > > > > Originally reported at > > https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080 > > > > Hmm, with this scenario I imagine you could manage to have > xennet_disconnect_backend() running multiple times for the same device > concurrently. > > How reliable can this be reproduced? How many vcpus does the guest have? Quite reliably (always?). And there are 2 vcpus. Interestingly, it doesn't happen on 6.12.42, but does on 6.15.10 and later. > Maybe the fix is as simple as adding a lock in xennet_disconnect_backend(). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab

Attachment: signature.asc
Description: PGP signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.