[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] libxl/PCI: Fix PV hotplug & stubdom coldplug


  • To: Jason Andryuk <jandryuk@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 10 Jan 2022 10:09:44 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rZM+z2n28HazBA8+1o/v25/WfixGopx/O7PGY9uY2+U=; b=GHNBCy6bngIhWFdKRsc1uMSUa1WpHFITnrhFlmpaQdnnuD8qaivjR2xDeioGCtjzArYUIw/Ho30Yu1VeY+FVJZRYSV471OC+vpdF+ty2oEaSQR0Ub9Zfxi6ZGMwAPnzkvmtQZaJXHy2H3hVcYHxDBEJlGMTzI809N7TbknqkBggEZ9cWVNpkzrSywKfQA387L2MYXDXlpWy4v/pC/G5Nm0a3JzDxWqCGSD1pdFz8U30thsLU5y2cVGuvO1eIYbXypZvlOt10EUz2ExkC9S7E7icPdhZaJ4cN3ZikOMsahgWyXzppdnddc+P8/L1N6pmUW+BJKzqjXSIBi4Gt4W0oog==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NdGtzeCOA2xhJDWF98pteO+cs4o+hIkBbRje007t8YSB7lmFI8ZE7bOTv0uf0ETUQjLHsWa0oKUh6QHD2MqucOe89cGef1KhvtDACqviFBohdS9ipS6GsvjWy53/R3aPkt+XxQBd+Hh1ud9A4mWsa9EvvtXMriwE4I1GMuuf3xXeHHEXOU7AKiCpyjAJekfgenISxLTXHK6+xkNCLblCjBtW2IC+kj4SZy50AumnJOiES9MFPAEv9uxDbFqnL45pJb87eiPnfkbDR+OCwhN5uZB+dRkNpNp9gjSK+ruRkYS9OTZzZfXQPowJsgs2afItsvN9iKpl4x8TLCvllIRSDw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Mon, 10 Jan 2022 09:09:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

(also Cc-ing Paul)

On 09.01.2022 19:04, Jason Andryuk wrote:
> commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
> reflected in the config" broken PCI hotplug (xl pci-attach) for PV
> domains when it moved libxl__create_pci_backend() later in the function.
> 
> This also broke HVM + stubdom PCI passthrough coldplug.  For that, the
> PCI devices are hotplugged to a running PV stubdom, and then the QEMU
> QMP device_add commands are made to QEMU inside the stubdom.
> 
> Are running PV domain calls libxl__wait_for_backend().

I could only make sense of this when replacing "Are" by "A", but then
I'm not sure that's what you've meant.

>  With the current
> placement of libxl__create_pci_backend(), the path does not exist and
> the call immediately fails:
> libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend 
> /local/domain/0/backend/pci/43/0 does not exist
> libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 
> 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
> libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 42:unable 
> to add pci devices
> 
> The wait is only relevant when the backend is already present.  num_devs
> is already used to determine if the backend needs to be created.  Re-use
> num_devs to determine if the backend wait is necessary.  The wait is
> necessary to avoid racing with another PCI attachment reconfiguring the
> front/back or changing to some other state like closing. If we are
> creating the backend, then we don't have to worry about the state since
> it is being created.

While I follow all of this, what I'm missing here is some form of proof
why the wait is _not_ needed for a newly created backend. Isn't it
necessary for the backend to reach Connected also when putting in place
the first device, in order for the guest to be able to actually use the
device? Is it perhaps assumed that the frontend driver would wait for
the backend reaching Connected (emphasizing the difference to HVM,
where no frontend driver exists)? Whatever the answer, it may be
worthwhile to also add that (in suitably abbreviated form) to the newly
added code comment.

> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
> reflected in the config")
> 
> Signed-off-by: Jason Andryuk <jandryuk@xxxxxxxxx>
> ---
> Alternative to Jan's patch:
> https://lore.kernel.org/xen-devel/5114ae87-bc0e-3d58-e16e-6d9d2fee0801@xxxxxxxx/

This answers the question raised in reply to Anthony's comment on my
patch.

> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -157,8 +157,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
>      if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
>  
> -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", 
> XenbusStateConnected)) < 0)
> +    /* wait is only needed if the backend already exists (num_devs != NULL) 
> */
> +    if (num_devs && !starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> +        if (libxl__wait_for_backend(gc, be_path,
> +                                    GCSPRINTF("%d", XenbusStateConnected)) < 
> 0)
>              return ERROR_FAIL;
>      }
>  

In how far is the !starting part of the condition then still needed? I
have to admit that I've been wondering about the applicability of this
condition anyway, and your patch description actually seems to further
support my suspicion about this not being quite right (which is
connected to the fact that devices get added one by one instead of all
in one go, which would avoid the need for the backend to reconfigure
at all during domain creation).

Two nits:

With tools/libs/light/CODING_STYLE not saying anything about comment
style, imo ./CODING_STYLE applies, which demands comments to start with
a capital letter.

Plus, while I'm not sure what the conventions in libxl are, touching this
code would seem like a great opportunity to me to fold the two if()-s.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.