[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible kernel/libxl race with xl network-attach


  • To: James Dingwall <james-xen@xxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 24 Jan 2022 10:07:54 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GNY/xXNXSDtzoPZoLxHjW+QfDUDiK3LrM+FRUyUkGKo=; b=VXPBWOwzXW+xr4HymngWbQsH/io5MEeURJgCIuBg3Zoy6DMiUSQNj9hiXepK27Nz8gNpvdSjhuqnj0pAzDJGExa/zA/bokz754HRdI1TFU0c6stAdEDamvrn2tTIGSsJ4vQSmwqvMYa0xQLocrazi2lFURba4higkyWDPvn4HOU0VFtJUs3javEpDR6naNhSNQE/aj18HFLH37NduOpqyXvOXZqQ42dbMKMv77R32057y24mDmkamc3pIH8i7y8Wl4G/F/kd7rlj7rhH09a2I2ve+ElX0ICsIClJA3Ku+xoGc/WJKws/dFNxlbJnzyzBiIEyye7t+CnEfb2u23sY1g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X+FeIEbu/aKqfGbuO8G1CIZ60hprtukkN/gfy89Qsu8chOaAjfi7l+FqqKeFHw4B2IT2v1yJJu7yRMzsME1mxL58R7te/Wvz5kgJ1UDGwLS7k5uzA9xaX1EuIbEWxt5xm04yaCc2CBr0PJwTnz/GuMeJazy/a3s1+1T7LJDzPTE+pyJf6jWPO0P6O+0CcJgBRJNWTUkFtvuzGxnpz5PTAEWfMjw4YNaWy5YQiCUOslY0338GJHnVjFnD4ZxPlvmG2VbkNFSBJTTDx1f0izanWHcCECMlopHnq3m7N3SIH8tFTRYKmXvErKFwfwvG8H42SnqDIOi8gtb9Tq9xKRNl1A==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wei.liu@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Mon, 24 Jan 2022 09:08:18 +0000
  • Ironport-data: A9a23:DwMYVKKCU5h9otbtFE+RIJIlxSXFcZb7ZxGr2PjKsXjdYENS0TQOn 2sYXTuOM66CMGfxL992aYm0pkMBvp6DzNMwQFBlqX01Q3x08seUXt7xwmUcns+xwm8vaGo9s q3yv/GZdJhcokcxIn5BC5C5xZVG/fjgqoHUVaiUakideSc+EH170Us5wLZj6mJVqYPR7z2l6 IuaT/L3YDdJ6xYsWo7Dw/vewP/HlK2aVAIw5jTSV9gS1LPtvyB94KYkDbOwNxPFrrx8RYZWc QphIIaRpQs19z91Yj+sfy2SnkciGtY+NiDW4pZatjTLbrGvaUXe345iXMfwZ3u7hB2bg4ljm NRK76WPEzgTNbPRuvgZaythRnQW0a1uoNcrIFC6uM2XiUbHb2Ht07NlC0Re0Y8wo7gtRzsUr LpBdW5LPkvra+GemdpXTsF2gcsuNo/zNZ43sXB81zDJS/0hRPgvRo2Uv4cHjG9u2qiiG97Oa fcBQjptZyieYjQMagYqJ7M8ofWB0yyXnzpw9wvO+PtfD3Lo5BN80f7pMcf9edGQQ8hR2EGCq Qru523kBjkKOdrZziCKmlquj+rChmbrVYQ6GLy++eFjhFnVw2FVFR5+fUu2p7y1h1CzX/pbK lcI4Ww+oK4q7kupQ9LhGRqirxa5UgU0AoQKVbdgsUfUl/SSs13x6nU4oiBpesB/pp83QmYT6 QW2pdj3GB5ztIWncCfInluLlg+aNS8QJG4EQCYLSwoZ/tXuyL0OYgLzosVLS/Ds0ICscd3k6 3XT9XVl2e1P5SIe//jjpTj6bySQSo8lp+LfziHeRSqb4wxwf+ZJjKT4uAGAvZ6swGt0J2RtX UToeeDDtoji7rnXzURhpdnh+5nyv5643MX02wIHInXY323FF4SfVY5R+ipiA0xiL9wJfzTkC GeK518LtcEIZyb7PPEuC25UNyjN5fO4fekJq9iONoYeCnSPXFHvEN5Sib64gDm2zRlEfVAXM paHa8e8ZUv2+ow8pAdas9w1iOdxrghnnDu7bcmik3yPjOTCDFbIF+ZtGAbeP4gRsfLfyC2Io ok3Cid/40gFOAEISnOJodd7wJFjBSVTOK0aXOQOJrfceVI3QT94YxITqJt4E7FYc21uvr6g1 lm2W1NCyUq5gnvCKA6QbWtkZq+pVpF6xU/X9wR2Vbpx83R8M4up8okFcJ47Iesu+OB5lKYmR PgZYcSQRP9IT22fqTgaaJD8qq1kdQiq2l3Sb3b0PmBncs4yXRHN9//lYhDrqHsEAB2ouJZsu LanzA7aH8YOHlwwEMbMZfuz5FqtpnxByvlqVk7FL4ALKkXh+YRnMQLrifozL51eIBnP3GLCh Q2XHQ0Zta/GpIpsqIvFgqWNroGIFepiHxUFQzmHvOjubSSDpzit245NVuqMbAvxbmKs9fXwf /hRwtH9LOYDwARAvb1jHus51qk5/dbu+eNXl1w2AHXRYl23Ibp8OX3aj9JXv6hAy7IF6wu7X kWDpotTNbmTYZ63FVcQIEwub/iZ1OFSkT7XtKxnLEL/7S5x3byGTUQNYEXc1H0DdON4YNE/3 OMsmM8K8Aju2BMlP+GPgj1Q62nRfGcLVL8qt81CDYLm4ubxJoquvXAI5vfK3ayy
  • Ironport-hdrordr: A9a23:WzGKJqAt0UsnTwXlHeg0sceALOsnbusQ8zAXPh9KJiC9I/b1qy nxppkmPH/P6Qr4WBkb6Le90Y27MAnhHPlOkPQs1NaZLXLbUQ6TQr2KgrGSoQEIdxeOk9K1kJ 0QD5SWa+eAfGSS7/yKmTVQeuxIqLLskNHKuQ6d9QYUcegDUdAf0+4TMHf8LqQZfngjOXJvf6 Dsmfav6gDQMkg/X4CePD0oTuLDr9rEmNbPZgMHPQcu7E2rgSmz4LD3PhCE1lNGOgk/jIsKwC zgqUjU96+ju/a0xlv10HLS1Y1fnJ/ExsFYDMKBp8AJInHHixquZq5mR7qe1QpF6t2H2RIPqp 3hsh0gN8N85zf4eXy0mwLk303a3DMn+xbZuCmlqEqmhfa8aCMxCsJHi44cWADe8VAcsNZ117 8O936FtrJMZCmw0hjV1pztbVVHh0C0qX0tnao4lHpES7YTb7dXsMg24F5VKpEdByj3gbpXX9 WGNPuspMq+TGnqLEww5gJUsZ6RtzUIb1u7q3E5y42oO2M8pgE986MarPZv6UvouqhND6Ws3N 60QZiAoos+OvP+XZgNdNvpfvHHeFAlYSi8eV56cm6XXJ3uBRr22uvKCfMOlaaXRKA=
  • Ironport-sdr: HQrN6nF73DL+2fyj/1C+GFzq+F/EWk+pXT6IKiPWgdfCcPmGgSbEelQ7lTAGiYVh55vm7ZXEKz gqAIb23ZMt01BD3GIZUAApbebcNjE/tGtuuhpNKRMEEe1T/gID9q5FDGbCc0YwXLkcqnqyScLl lsk0XABijc9MHwYMkNxnQkFuXzfD8cd8Sx0O4rzgkCRE0/BmlloC32QQHDng0HtzLZLdYkQUvY OU3VcJ/w/3ZfH/nTlepJ5viSNJKpYi2MOEshn1kVh75HU86HgmcrSivhjuA2S+iiGbiCTmUeNd SLmFWPc5VvXhO9bK0Fjn4X9i
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jan 21, 2022 at 03:05:07PM +0000, James Dingwall wrote:
> On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote:
> > On Fri, Jan 21, 2022 at 01:34:54PM +0000, James Dingwall wrote:
> > > On 2022-01-13 16:11, Roger Pau Monné wrote:
> > > > On Thu, Jan 13, 2022 at 11:19:46AM +0000, James Dingwall wrote:
> > > > > 
> > > > > I have been trying to debug a problem where a vif with the backend
> > > > > in a
> > > > > driver domain is added to dom0.  Intermittently the hotplug script is
> > > > > not invoked by libxl (running as xl devd) in the driver domain.  By
> > > > > enabling some debug for the driver domain kernel and libxl I have
> > > > > these
> > > > > messages:
> > > > > 
> > > > > driver domain kernel (Ubuntu 5.4.0-92-generic):
> > > > > 
> > > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0:
> > > > > Successfully created xenvif
> > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > > /local/domain/0/device/vif/0 -> Initialising
> > > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
> > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > > /local/domain/0/device/vif/0 -> Connected
> > > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> > > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected
> > > > > 
> > > > > xl devd (Xen 4.14.3):
> > > > > 
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > > wpath=/local/domain/2/backend token=3/0: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700:
> > > > > nested ao, parent 0x5633ac567f90
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:1055:devstate_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > > state 4
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > > wpath=/local/domain/2/backend token=3/0: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220:
> > > > > nested ao, parent 0x5633ac567f90
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:1055:devstate_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > > state 4
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_aoutils.c:88:xswait_timeout_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to
> > > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state
> > > > > token=2/1: deregister slotnum=2
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:1039:devstate_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569180: deregister unregistered
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_device.c:1092:device_backend_callback: calling
> > > > > device_backend_cleanup
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569180: deregister unregistered
> > > > > 2022-01-13 01:39:51 UTC libxl: error:
> > > > > libxl_device.c:1105:device_backend_callback: unable to add device
> > > > > with path /local/domain/2/backend/vif/0/0
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569280: deregister unregistered
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_device.c:1470:device_complete: device
> > > > > /local/domain/2/backend/vif/0/0 add failed
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy
> > > > > 
> > > > > the xenstore content for the backend:
> > > > > 
> > > > > # xenstore-ls /local/domain/2/backend/vif/0
> > > > > 0 = ""
> > > > >  frontend = "/local/domain/0/device/vif/0"
> > > > >  frontend-id = "0"
> > > > >  online = "1"
> > > > >  state = "4"
> > > > >  script = "/etc/xen/scripts/vif-zynstra"
> > > > >  vifname = "dom0.0"
> > > > >  mac = "00:16:3e:6c:de:82"
> > > > >  bridge = "cluster"
> > > > >  handle = "0"
> > > > >  type = "vif"
> > > > >  feature-sg = "1"
> > > > >  feature-gso-tcpv4 = "1"
> > > > >  feature-gso-tcpv6 = "1"
> > > > >  feature-ipv6-csum-offload = "1"
> > > > >  feature-rx-copy = "1"
> > > > >  feature-rx-flip = "0"
> > > > >  feature-multicast-control = "1"
> > > > >  feature-dynamic-multicast-control = "1"
> > > > >  feature-split-event-channels = "1"
> > > > >  multi-queue-max-queues = "2"
> > > > >  feature-ctrl-ring = "1"
> > > > >  hotplug-status = "connected"
> > 
> > The relevant point here is that `hotplug-status = "connected"` in the
> > backend xenstore nodes, and that's set by the hotplug script.
> > 
> > Having hotplug-status == "connected" will allow the backend to proceed
> > to the connected state, so there's some component in your system that
> > sets this xenstore node before `xl devd` get a change to run the
> > hotplug script, hence me asking for any udev rules in the previous
> > email.
> > 
> > If it's not an udev rule then I'm lost. Do you have any modifications
> > to the Xen tools that could set this xenstore node?
> 
> I am wondering if that xenstore content was captured after the environment
> had been manually fixed.  I have been able to reproduce it by hotplugging
> an interface where libxl isn't patched:
> 
> 
> dom0# xl network-attach 0 backend=netdd script=vif-zynstra vifname=dom0.2 
> bridge=abridge
> netdd# xenstore-ls /local/domain/2/backend/vif/0/2
> frontend = "/local/domain/0/device/vif/2"
> frontend-id = "0"
> online = "1"
> state = "4"
> script = "/etc/xen/scripts/vif-zynstra"
> vifname = "dom0.2"
> mac = "00:16:3e:5f:fc:51"
> bridge = "abridge"
> handle = "2"
> type = "vif"
> feature-sg = "1"
> feature-gso-tcpv4 = "1"
> feature-gso-tcpv6 = "1"
> feature-ipv6-csum-offload = "1"
> feature-rx-copy = "1"
> feature-rx-flip = "0"
> feature-multicast-control = "1"
> feature-dynamic-multicast-control = "1"
> feature-split-event-channels = "1"
> multi-queue-max-queues = "2"
> feature-ctrl-ring = "1"
> 
> If I have understood the backend kernel code it only waits for the
> hotplug-status == "connected" if they key exists which it doesn't
> appear to by default.

Indeed. I have to admit this is quite weird. I have the following
completely untested patch, could you give it a try?

Adding netback maintainers for feedback on whether it's fine for libxl
to force netback to wait for hotplug script execution. Not sure the
reason why netback doesn't do that by default, but it seems it's been
like that since the module was imported into Linux in 2011.

Thanks, Roger.
---
diff --git a/tools/libs/light/libxl_nic.c b/tools/libs/light/libxl_nic.c
index 0b45469dca..0b9e70c9d1 100644
--- a/tools/libs/light/libxl_nic.c
+++ b/tools/libs/light/libxl_nic.c
@@ -248,6 +248,13 @@ static int libxl__set_xenstore_nic(libxl__gc *gc, uint32_t 
domid,
     flexarray_append(ro_front, "mtu");
     flexarray_append(ro_front, GCSPRINTF("%u", nic->mtu));
 
+    /*
+     * Force backend to wait for hotplug script execution before switching to
+     * connected state.
+     */
+    flexarray_append(back, "hotplug-status");
+    flexarray_append(back, "");
+
     return 0;
 }
 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.