[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

vpci: Need for vpci_cancel_pending


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 28 Oct 2021 10:04:20 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sjYkhuSv5aGZ6QJd7Al7PtnntmiqCTKVkPcOMq5b7QU=; b=ZGPn7cKp9q5Z+t7Mi80jKPWUiaw4FaCOTIfmBjnozQf30wnvfTg8XSat0hrFBweFkiUg5dhc6+rYT3Tv7s7CZ5mBPxg/ZoLkgwOcob+0nABox8/1BCbGcJ2Zd4ZOGOySOt4d4zVL1fevLW+2IcLMww5ZYs12Orf9qpwhPEORNJfWGqUQ5G9jlR3hkquOpZWjzN4bi/C40809MYgab6PmN3GxzrENZ0pzZNnFiLRql0sRVuM9BuriPKvo9d0pGGezhTUrHzAhFrs381mNbCDZ+0dshKwybQ/FIyrUBAijw98JO8qJ0PIJiR1bkFLSz6t4PzCz/xRCq0km1FR4FuEbiA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IKkVgvmREF6JbpN1hgnC3mCnicD4j6E2SIM0v80WSkXV7jGkAWBpfnORcWYNAMUjJ18CgZHlROhJV4zd7raPsR1s6NldBlOT93eZYquCj308vlFhI1DKIEe+8bQEF/lJyPgxEMfvKNCqKQO8Kpb24hXpB1UIxu7bihJ/lRx2vhgdfCzBi4t9TrcZSvAkXmW6tVuj/0dKY+pKFRJfDYEzi0Z8aNbBzCSquLS5qs8o8e8pXabe9M5Sb4xJ4WHxvDuJZWoM8w+8of5uUAnMdQihy/2N9u1XuH7wWNBawYZ/op73IB0tOnzu6IEZmiQ+5uNBJluqPprTCsJudnmFr7Zq4Q==
  • Delivery-date: Thu, 28 Oct 2021 10:04:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXy+MtilSp1e2bsUu2TEgvHf6jsQ==
  • Thread-topic: vpci: Need for vpci_cancel_pending

Hi, all!

While working on PCI passthrough on Arm I stepped onto a crash
with the following call chain:

pci_physdev_op
   pci_add_device
       init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ)
   iommu_add_device <- FAILS
   vpci_remove_device -> xfree(pdev->vpci)

Then:
leave_hypervisor_to_guest
   vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL

Which results in the crash below:

(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x10 on CPU0 via TTBR 0x00000000481dd000
(XEN) 0TH[0x0] = 0x00000000481dcf7f
(XEN) 1ST[0x0] = 0x00000000481d9f7f
(XEN) 2ND[0x0] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort
...
(XEN) Xen call trace:
(XEN)    [<00000000002246d8>] _spin_lock+0x40/0xa4 (PC)
(XEN)    [<00000000002246c0>] _spin_lock+0x28/0xa4 (LR)
(XEN)    [<000000000024f6d0>] vpci_process_pending+0x78/0x128
(XEN)    [<000000000027f7e8>] leave_hypervisor_to_guest+0x50/0xcc
(XEN)    [<0000000000269c5c>] entry.o#guest_sync_slowpath+0xa8/0xd4

So, it seems that if pci_add_device fails and calls vpci_remove_device
the later needs to cancel any pending work.

If this is a map operation it seems to be straightforward: destroy
the range set and do not map anything.

If vpci_remove_device is called and unmap operation was scheduled
then it can be that:
- guest is being destroyed for any reason and skipping unmap is ok
   as all the mappings for the whole domain will be destroyed anyways
- guest is still going to stay alive and then unmapping must be done

I would like to hear your thought what would be the right approach
to take in order to solve the issue.

Thank you in advance,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.