[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Oleksandr Andrushchenko <andr2000@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 15 Nov 2021 17:56:14 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=l0crSbdCxIsQ1vEscNNeY5we1MkrS+AqRP/lqWHF+Sg=; b=nHWmv+AfatY4fpqmC8ccXmWow+AnaN/Sj5X3TIJzVdr1E5cnERGTfq1iK/tf3EfGWC5SD6M1D6GuMeK91LTRXIh4rLxMAib7RunRD3uoCieVSUBe5YDowG9KNEPgo4uNOtjvW/LwPaZJwYU6yZdZQUba7nZm22PMhnrt8tHAOVOfqpAvPLg6Xsg0LbZVLaBECtFJJZ91nRHxU79BJieUfXAJC3BFLHBhibJf0RB33k9yEOKlmiib9xbRDOppF3zgauxIx6jHnAM8uCl34oG+wuzMGPxlPzkxHEzEUKMEIYabuB13TdEhSXq9pWXCJdl6JdDe4AQgdQY5Bdy7zZmfJw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=m0e5/GjbjI/UtrEg+yN+PgUilaDE2Au3EkMpImA2+N+YRnU+yDN3ccgIePLpcxR6aHsu1Du+9/P31I8Y8iXQIPFoP/sFgQULJZvk+klOtOO656cjVv3Nba5eGRRnzX/Ol/xCrI70/2uK1O5khND8VDsUNz3kviL2Nvi+TgxHf5lZXGbJP4hPaU5QsLkrLqJFgQ5aK4U/jK0fq1JeNmj3ZkLVfv3Ug1NT+oXdX/hcJ1WxF2cl0N7+0TFX9RAj9VHwwo76mtwl3TF6q1XQOq1BWvrY/5DkJq0p+CkHSYrO7zTZ9h25e4W3xjMYWC2D5T13Db3ZNHDq9sV3wSaMy+Rk5g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: julien@xxxxxxx, sstabellini@xxxxxxxxxx, oleksandr_tyshchenko@xxxxxxxx, volodymyr_babchuk@xxxxxxxx, Artem_Mygaiev@xxxxxxxx, roger.pau@xxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, paul@xxxxxxx, bertrand.marquis@xxxxxxx, rahul.singh@xxxxxxx, Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 15 Nov 2021 16:56:45 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> 
> When a vPCI is removed for a PCI device it is possible that we have
> scheduled a delayed work for map/unmap operations for that device.
> For example, the following scenario can illustrate the problem:
> 
> pci_physdev_op
>    pci_add_device
>        init_bars -> modify_bars -> defer_map -> 
> raise_softirq(SCHEDULE_SOFTIRQ)
>    iommu_add_device <- FAILS
>    vpci_remove_device -> xfree(pdev->vpci)
> 
> leave_hypervisor_to_guest
>    vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
> 
> For the hardware domain we continue execution as the worse that
> could happen is that MMIO mappings are left in place when the
> device has been deassigned

Is continuing safe in this case? I.e. isn't there the risk of a NULL
deref?

> For unprivileged domains that get a failure in the middle of a vPCI
> {un}map operation we need to destroy them, as we don't know in which
> state the p2m is. This can only happen in vpci_process_pending for
> DomUs as they won't be allowed to call pci_add_device.

You saying "we need to destroy them" made me look for a new domain_crash()
that you add, but there is none. What is this about?

> @@ -165,6 +164,18 @@ bool vpci_process_pending(struct vcpu *v)
>      return false;
>  }
>  
> +void vpci_cancel_pending(const struct pci_dev *pdev)
> +{
> +    struct vcpu *v = current;
> +
> +    /* Cancel any pending work now. */

Doesn't "any" include pending work on all vCPU-s of the guest, not
just current? Is current even relevant (as in: part of the correct
domain), considering ...

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev)
>          xfree(r);
>      }
>      spin_unlock(&pdev->vpci->lock);
> +
> +    vpci_cancel_pending(pdev);

... this code path, when coming here from pci_{add,remove}_device()?

I can agree that there's a problem here, but I think you need to
properly (i.e. in a race free manner) drain pending work.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.