Xen project Mailing List

vpci: Need for vpci_cancel_pending

To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Thu, 28 Oct 2021 10:04:20 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sjYkhuSv5aGZ6QJd7Al7PtnntmiqCTKVkPcOMq5b7QU=; b=ZGPn7cKp9q5Z+t7Mi80jKPWUiaw4FaCOTIfmBjnozQf30wnvfTg8XSat0hrFBweFkiUg5dhc6+rYT3Tv7s7CZ5mBPxg/ZoLkgwOcob+0nABox8/1BCbGcJ2Zd4ZOGOySOt4d4zVL1fevLW+2IcLMww5ZYs12Orf9qpwhPEORNJfWGqUQ5G9jlR3hkquOpZWjzN4bi/C40809MYgab6PmN3GxzrENZ0pzZNnFiLRql0sRVuM9BuriPKvo9d0pGGezhTUrHzAhFrs381mNbCDZ+0dshKwybQ/FIyrUBAijw98JO8qJ0PIJiR1bkFLSz6t4PzCz/xRCq0km1FR4FuEbiA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IKkVgvmREF6JbpN1hgnC3mCnicD4j6E2SIM0v80WSkXV7jGkAWBpfnORcWYNAMUjJ18CgZHlROhJV4zd7raPsR1s6NldBlOT93eZYquCj308vlFhI1DKIEe+8bQEF/lJyPgxEMfvKNCqKQO8Kpb24hXpB1UIxu7bihJ/lRx2vhgdfCzBi4t9TrcZSvAkXmW6tVuj/0dKY+pKFRJfDYEzi0Z8aNbBzCSquLS5qs8o8e8pXabe9M5Sb4xJ4WHxvDuJZWoM8w+8of5uUAnMdQihy/2N9u1XuH7wWNBawYZ/op73IB0tOnzu6IEZmiQ+5uNBJluqPprTCsJudnmFr7Zq4Q==

Delivery-date: Thu, 28 Oct 2021 10:04:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHXy+MtilSp1e2bsUu2TEgvHf6jsQ==

Thread-topic: vpci: Need for vpci_cancel_pending

Hi, all! While working on PCI passthrough on Arm I stepped onto a crash with the following call chain: pci_physdev_op pci_add_device init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ) iommu_add_device <- FAILS vpci_remove_device -> xfree(pdev->vpci) Then: leave_hypervisor_to_guest vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL Which results in the crash below: (XEN) Data Abort Trap. Syndrome=0x6 (XEN) Walking Hypervisor VA 0x10 on CPU0 via TTBR 0x00000000481dd000 (XEN) 0TH[0x0] = 0x00000000481dcf7f (XEN) 1ST[0x0] = 0x00000000481d9f7f (XEN) 2ND[0x0] = 0x0000000000000000 (XEN) CPU0: Unexpected Trap: Data Abort ... (XEN) Xen call trace: (XEN) [<00000000002246d8>] _spin_lock+0x40/0xa4 (PC) (XEN) [<00000000002246c0>] _spin_lock+0x28/0xa4 (LR) (XEN) [<000000000024f6d0>] vpci_process_pending+0x78/0x128 (XEN) [<000000000027f7e8>] leave_hypervisor_to_guest+0x50/0xcc (XEN) [<0000000000269c5c>] entry.o#guest_sync_slowpath+0xa8/0xd4 So, it seems that if pci_add_device fails and calls vpci_remove_device the later needs to cancel any pending work. If this is a map operation it seems to be straightforward: destroy the range set and do not map anything. If vpci_remove_device is called and unmap operation was scheduled then it can be that: - guest is being destroyed for any reason and skipping unmap is ok as all the mappings for the whole domain will be destroyed anyways - guest is still going to stay alive and then unmapping must be done I would like to hear your thought what would be the right approach to take in order to solve the issue. Thank you in advance, Oleksandr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.