|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: WARN in xennet_disconnect_backend when frontend is paused during backend shutdown
On Fri, Sep 12, 2025 at 11:49:12AM +0200, Jürgen Groß wrote:
> On 11.09.25 17:11, Marek Marczykowski-Górecki wrote:
> > Hi,
> >
> > The steps:
> > 1. Have domU netfront ("untrusted" here) and domU netback
> > ("sys-firewall-alt" here).
> > 2. Pause frontend
> > 3. Shutdown backend
> > 4. Unpause frontend
> > 5. Detach network (in my case attaching another one follows just after,
> > but I believe it's not relevant).
> >
> > This gives the following on the frontend side:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328
> > xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
> > Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device
> > snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4
> > nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack
> > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common
> > intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
> > pmt_class intel_pmc_ssram_telemetry intel_vsec
> > polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback
> > target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback
> > xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront
> > CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted
> > 6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
> > RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
> > Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1
> > 01 75 79 66 90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49
> > 81 ff 00 01 00 00 0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0
> > RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246
> > RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5
> > RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60
> > RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80
> > R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000
> > R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001
> > FS: 0000000000000000(0000) GS:ffff8880952c6000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > xennet_remove+0x1e/0x80 [xen_netfront]
> > xenbus_dev_remove+0x6e/0xf0
> > device_release_driver_internal+0x19c/0x200
> > bus_remove_device+0xc6/0x130
> > device_del+0x160/0x3e0
> > ? _raw_spin_unlock+0xe/0x30
> > ? klist_iter_exit+0x18/0x30
> > ? __pfx_xenwatch_thread+0x10/0x10
> > device_unregister+0x17/0x60
> > xenbus_dev_changed+0x1d7/0x240
> > xenwatch_thread+0x8f/0x1c0
> > ? __pfx_autoremove_wake_function+0x10/0x10
> > kthread+0xf9/0x240
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork+0x152/0x180
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork_asm+0x1a/0x30
> > </TASK>
> > ---[ end trace 0000000000000000 ]---
> > xen_netfront: backend supports XDP headroom
> > vif vif-0: bouncing transmitted data to zeroed pages
> >
> > The last two are likely related to following attach, not detach.
> >
> > The same happens on 6.15 too, so it isn't new thing.
> >
> > Shutting down backend without detaching first is not really a normal
> > operation, and doing that while frontend is paused is even less so. But
> > is the above expected outcome? If I read it right, it's
> > WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find
> > confusing.
> >
> > Originally reported at
> > https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080
> >
>
> Hmm, with this scenario I imagine you could manage to have
> xennet_disconnect_backend() running multiple times for the same device
> concurrently.
>
> How reliable can this be reproduced? How many vcpus does the guest have?
Quite reliably (always?). And there are 2 vcpus.
Interestingly, it doesn't happen on 6.12.42, but does on 6.15.10 and
later.
> Maybe the fix is as simple as adding a lock in xennet_disconnect_backend().
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |