[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WARN in xennet_disconnect_backend when frontend is paused during backend shutdown



On Fri, Sep 12, 2025 at 11:49:12AM +0200, Jürgen Groß wrote:
> On 11.09.25 17:11, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > The steps:
> > 1. Have domU netfront ("untrusted" here) and domU netback
> > ("sys-firewall-alt" here).
> > 2. Pause frontend
> > 3. Shutdown backend
> > 4. Unpause frontend
> > 5. Detach network (in my case attaching another one follows just after,
> > but I believe it's not relevant).
> > 
> > This gives the following on the frontend side:
> > 
> >      ------------[ cut here ]------------
> >      WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328 
> > xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
> >      Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device 
> > snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 
> > nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common 
> > intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery 
> > pmt_class intel_pmc_ssram_telemetry intel_vsec 
> > polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback 
> > target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback 
> > xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront
> >      CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted 
> > 6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
> >      RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
> >      Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1 
> > 01 75 79 66 90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49 
> > 81 ff 00 01 00 00 0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0
> >      RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246
> >      RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5
> >      RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60
> >      RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80
> >      R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000
> >      R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001
> >      FS:  0000000000000000(0000) GS:ffff8880952c6000(0000) 
> > knlGS:0000000000000000
> >      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >      CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0
> >      PKRU: 55555554
> >      Call Trace:
> >       <TASK>
> >       xennet_remove+0x1e/0x80 [xen_netfront]
> >       xenbus_dev_remove+0x6e/0xf0
> >       device_release_driver_internal+0x19c/0x200
> >       bus_remove_device+0xc6/0x130
> >       device_del+0x160/0x3e0
> >       ? _raw_spin_unlock+0xe/0x30
> >       ? klist_iter_exit+0x18/0x30
> >       ? __pfx_xenwatch_thread+0x10/0x10
> >       device_unregister+0x17/0x60
> >       xenbus_dev_changed+0x1d7/0x240
> >       xenwatch_thread+0x8f/0x1c0
> >       ? __pfx_autoremove_wake_function+0x10/0x10
> >       kthread+0xf9/0x240
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork+0x152/0x180
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork_asm+0x1a/0x30
> >       </TASK>
> >      ---[ end trace 0000000000000000 ]---
> >      xen_netfront: backend supports XDP headroom
> >      vif vif-0: bouncing transmitted data to zeroed pages
> > 
> > The last two are likely related to following attach, not detach.
> > 
> > The same happens on 6.15 too, so it isn't new thing.
> > 
> > Shutting down backend without detaching first is not really a normal
> > operation, and doing that while frontend is paused is even less so. But
> > is the above expected outcome? If I read it right, it's
> > WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find
> > confusing.
> > 
> > Originally reported at 
> > https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080
> > 
> 
> Hmm, with this scenario I imagine you could manage to have
> xennet_disconnect_backend() running multiple times for the same device
> concurrently.
> 
> How reliable can this be reproduced? How many vcpus does the guest have?

Quite reliably (always?). And there are 2 vcpus.
Interestingly, it doesn't happen on 6.12.42, but does on 6.15.10 and
later.

> Maybe the fix is as simple as adding a lock in xennet_disconnect_backend().

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.