[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible kernel/libxl race with xl network-attach



On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote:
> On Fri, Jan 21, 2022 at 01:34:54PM +0000, James Dingwall wrote:
> > On 2022-01-13 16:11, Roger Pau Monné wrote:
> > > On Thu, Jan 13, 2022 at 11:19:46AM +0000, James Dingwall wrote:
> > > > 
> > > > I have been trying to debug a problem where a vif with the backend
> > > > in a
> > > > driver domain is added to dom0.  Intermittently the hotplug script is
> > > > not invoked by libxl (running as xl devd) in the driver domain.  By
> > > > enabling some debug for the driver domain kernel and libxl I have
> > > > these
> > > > messages:
> > > > 
> > > > driver domain kernel (Ubuntu 5.4.0-92-generic):
> > > > 
> > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0:
> > > > Successfully created xenvif
> > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > /local/domain/0/device/vif/0 -> Initialising
> > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
> > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > /local/domain/0/device/vif/0 -> Connected
> > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected
> > > > 
> > > > xl devd (Xen 4.14.3):
> > > > 
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > wpath=/local/domain/2/backend token=3/0: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700:
> > > > nested ao, parent 0x5633ac567f90
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:1055:devstate_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > state 4
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > wpath=/local/domain/2/backend token=3/0: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220:
> > > > nested ao, parent 0x5633ac567f90
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:1055:devstate_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > state 4
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_aoutils.c:88:xswait_timeout_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to
> > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state
> > > > token=2/1: deregister slotnum=2
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:1039:devstate_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569180: deregister unregistered
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_device.c:1092:device_backend_callback: calling
> > > > device_backend_cleanup
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569180: deregister unregistered
> > > > 2022-01-13 01:39:51 UTC libxl: error:
> > > > libxl_device.c:1105:device_backend_callback: unable to add device
> > > > with path /local/domain/2/backend/vif/0/0
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569280: deregister unregistered
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_device.c:1470:device_complete: device
> > > > /local/domain/2/backend/vif/0/0 add failed
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy
> > > > 
> > > > the xenstore content for the backend:
> > > > 
> > > > # xenstore-ls /local/domain/2/backend/vif/0
> > > > 0 = ""
> > > >  frontend = "/local/domain/0/device/vif/0"
> > > >  frontend-id = "0"
> > > >  online = "1"
> > > >  state = "4"
> > > >  script = "/etc/xen/scripts/vif-zynstra"
> > > >  vifname = "dom0.0"
> > > >  mac = "00:16:3e:6c:de:82"
> > > >  bridge = "cluster"
> > > >  handle = "0"
> > > >  type = "vif"
> > > >  feature-sg = "1"
> > > >  feature-gso-tcpv4 = "1"
> > > >  feature-gso-tcpv6 = "1"
> > > >  feature-ipv6-csum-offload = "1"
> > > >  feature-rx-copy = "1"
> > > >  feature-rx-flip = "0"
> > > >  feature-multicast-control = "1"
> > > >  feature-dynamic-multicast-control = "1"
> > > >  feature-split-event-channels = "1"
> > > >  multi-queue-max-queues = "2"
> > > >  feature-ctrl-ring = "1"
> > > >  hotplug-status = "connected"
> 
> The relevant point here is that `hotplug-status = "connected"` in the
> backend xenstore nodes, and that's set by the hotplug script.
> 
> Having hotplug-status == "connected" will allow the backend to proceed
> to the connected state, so there's some component in your system that
> sets this xenstore node before `xl devd` get a change to run the
> hotplug script, hence me asking for any udev rules in the previous
> email.
> 
> If it's not an udev rule then I'm lost. Do you have any modifications
> to the Xen tools that could set this xenstore node?

I am wondering if that xenstore content was captured after the environment
had been manually fixed.  I have been able to reproduce it by hotplugging
an interface where libxl isn't patched:


dom0# xl network-attach 0 backend=netdd script=vif-zynstra vifname=dom0.2 
bridge=abridge
netdd# xenstore-ls /local/domain/2/backend/vif/0/2
frontend = "/local/domain/0/device/vif/2"
frontend-id = "0"
online = "1"
state = "4"
script = "/etc/xen/scripts/vif-zynstra"
vifname = "dom0.2"
mac = "00:16:3e:5f:fc:51"
bridge = "abridge"
handle = "2"
type = "vif"
feature-sg = "1"
feature-gso-tcpv4 = "1"
feature-gso-tcpv6 = "1"
feature-ipv6-csum-offload = "1"
feature-rx-copy = "1"
feature-rx-flip = "0"
feature-multicast-control = "1"
feature-dynamic-multicast-control = "1"
feature-split-event-channels = "1"
multi-queue-max-queues = "2"
feature-ctrl-ring = "1"

If I have understood the backend kernel code it only waits for the
hotplug-status == "connected" if they key exists which it doesn't
appear to by default.

Thanks,
James



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.