Xen project Mailing List

Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology

From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Thu, 30 Sep 2021 09:34:59 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=n1XZHUwlSAfHhhLMFO4ZKfkLQSJEd/109NwVnP/VIWw=; b=Mb+DRDqXLla3NCEu+FE++bQlfEFLzcdh52M4EXpX8uwR2tAr56umrWrmSrquiSHHy5bV4pOVo4LlyRCcR3MnKsLs0Rzb+MxlGCqLigQypVU4IrPyE3H3SERDLAQ8Zib5uAJq23Be9h9vDEBDreXxAWqU8qqve3ow8FItQ7WHLDm1xZ3MjxlPlsVZMAXa2TWJpcWMM1SPlOtLHr6iz0fVc9zJWpLZZizmssHmX8/ujuzVvkqILM7eKld5VK0YS5WHlbez2qFkpivWd/skGll3HfMNj0XD+QC0ub9tgOU1mmMZI90xiQfhxAj0GWtTeOLClguHnOSz4Op+bnWGNnTFPw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bRF48iGRfLfIE37uKTLuuB+CBRAeTE3q9hogMule+3vrPjWjZTmxgmN6NusdEJpxD8NZPnAae87EWInbZBh7WbvSNV5HT9FwaFwGotx9k9A/GEqImVOmJ2DyAWkM426QJY90CqC95LC5RbuzkNGO7hTUDLFqEJiQrRwbWVoa2IZlw6JQI2c5lexHYXjqq3UNxj3iiLbl3/cy3pdOz/IwHYZZ5q99v6pLBjqrgDhy5pEBJDJXEvH8RLq7VLR/On3SpqVc3G1l8AhbS16BK6tQ60Juck1nNVC/Kgy0NABEywm/PrOGhi0hjNGDwMQXELi+5Ab5yXB2INX3wJJU4iOOTw==

Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=epam.com;

Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 30 Sep 2021 09:35:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHXtdAlOCu2AnkIOUC4Prsocox3squ8RSWAgAAMGAA=

Thread-topic: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology

On 30.09.21 11:51, Jan Beulich wrote: > On 30.09.2021 09:52, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> Assign SBDF to the PCI devices being passed through with bus 0. > This reads a little odd: If bus is already known (and I think you imply > segment to also be known), it's only DF which get assigned. But at the end of the day we set all the parts of that SBDF. Otherwise I should write "Assign DF as we know that bus and segment are 0" > >> The resulting topology is where PCIe devices reside on the bus 0 of the >> root complex itself (embedded endpoints). >> This implementation is limited to 32 devices which are allowed on >> a single PCI bus. > Or up to 256 when there are multi-function ones. Imo you at least want > to spell out how that case is intended to be handled (even if maybe > the code doesn't cover that case yet, in which case a respective code > comment would also want leaving). We are not supporting multi-function yet, so I'll add a comment. > >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) >> return ret; >> } >> >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT > May I ask why the code enclosed by this conditional has been put here > rather than under drivers/vpci/? Indeed this can be moved to xen/drivers/vpci/vpci.c. I'll move and update function names accordingly. > >> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d, >> + const struct pci_dev *pdev) >> +{ >> + struct vpci_dev *vdev; >> + >> + list_for_each_entry ( vdev, &d->vdev_list, list ) >> + if ( vdev->pdev == pdev ) >> + return vdev; >> + return NULL; >> +} > No locking here or ... > >> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev) >> +{ >> + struct vpci_dev *vdev; >> + >> + ASSERT(!pci_find_virtual_device(d, pdev)); > ... in this first caller that I've managed to spot? See also below as > to up the call tree the pcidevs-lock being held (which at the very > least you would then want to ASSERT() for here). I will move the code to vpci and make sure proper locking there > >> + /* Each PCI bus supports 32 devices/slots at max. */ >> + if ( d->vpci_dev_next > 31 ) >> + return -ENOSPC; > Please avoid open-coding literals when they can be suitably expressed. I failed to find a suitable constant for that. Could you please point me to the one I can use here? > >> + vdev = xzalloc(struct vpci_dev); >> + if ( !vdev ) >> + return -ENOMEM; >> + >> + /* We emulate a single host bridge for the guest, so segment is always >> 0. */ >> + vdev->seg = 0; >> + >> + /* >> + * The bus number is set to 0, so virtual devices are seen >> + * as embedded endpoints behind the root complex. >> + */ >> + vdev->bus = 0; > Strictly speaking both of these assignments are redundant with you > using xzalloc(). I'd prefer if there was just a comment, as the compiler > has no way recognizing this in order to eliminate these stores. Yes, I just put the assignments to be explicitly seen here as being 0. I will remove those and put a comment. > >> + vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0); >> + >> + vdev->pdev = pdev; >> + vdev->domain = d; >> + >> + pcidevs_lock(); >> + list_add_tail(&vdev->list, &d->vdev_list); >> + pcidevs_unlock(); > I don't support a global lock getting (ab)used for per-domain list > management. > > Apart from that I don't understand why you acquire the lock here. Does > that mean the functions further were truly left without any locking, > by you not having noticed that this lock is already being held by the > sole caller? I'll re-work locking with respect to the new location for this, e.g. vpci > >> --- a/xen/drivers/vpci/vpci.c >> +++ b/xen/drivers/vpci/vpci.c >> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev) >> /* Notify vPCI that device is assigned to guest. */ >> int vpci_assign_device(struct domain *d, const struct pci_dev *dev) >> { >> + int rc; >> + >> /* It only makes sense to assign for hwdom or guest domain. */ >> if ( is_system_domain(d) || !has_vpci(d) ) >> return 0; >> >> - return vpci_bar_add_handlers(d, dev); >> + rc = vpci_bar_add_handlers(d, dev); >> + if ( rc ) >> + return rc; >> + >> + return pci_add_virtual_device(d, dev); >> } > I've peeked at the earlier patch, and both there and here I'm struggling to > see how undoing partially completed steps or fully completed earlier steps > is intended to work. I'm not convinced it is legitimate to leave handlers > in place until the tool stack manages to roll back the failed device > assignment. I'll see what and how we can roll back in case of error > >> /* Notify vPCI that device is de-assigned from guest. */ >> int vpci_deassign_device(struct domain *d, const struct pci_dev *dev) >> { >> + int rc; >> + >> /* It only makes sense to de-assign from hwdom or guest domain. */ >> if ( is_system_domain(d) || !has_vpci(d) ) >> return 0; >> >> + rc = pci_remove_virtual_device(d, dev); >> + if ( rc ) >> + return rc; >> + >> return vpci_bar_remove_handlers(d, dev); >> } > So what's the ultimate effect of a partially de-assigned device, where > one of the later steps failed? In a number of places we do best-effort > full cleanup, by recording errors but nevertheless continuing with > subsequent cleanup steps. I wonder whether that's a model to use here > as well. > >> --- a/xen/include/xen/pci.h >> +++ b/xen/include/xen/pci.h >> @@ -137,6 +137,24 @@ struct pci_dev { >> struct vpci *vpci; >> }; >> >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT >> +struct vpci_dev { >> + struct list_head list; >> + /* Physical PCI device this virtual device is connected to. */ >> + const struct pci_dev *pdev; >> + /* Virtual SBDF of the device. */ >> + union { >> + struct { >> + uint8_t devfn; >> + uint8_t bus; >> + uint16_t seg; >> + }; >> + pci_sbdf_t sbdf; > Could you explain to me why pci_sbdf_t (a typedef of a union) isn't > providing all you need? By putting it in a union with a custom > struct you set yourself up for things going out of sync if anyone > chose to alter pci_sbdf_t's layout. Sure, pci_sbdf_t should be enough > >> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg); >> int pci_add_device(u16 seg, u8 bus, u8 devfn, >> const struct pci_dev_info *, nodeid_t node); >> int pci_remove_device(u16 seg, u8 bus, u8 devfn); >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT >> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev); >> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev); >> +#endif > Like for their definitions I question the placement of these > declarations. Will move to vpci.h > >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -444,6 +444,14 @@ struct domain >> >> #ifdef CONFIG_HAS_PCI >> struct list_head pdev_list; >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT >> + struct list_head vdev_list; >> + /* >> + * Current device number used by the virtual PCI bus topology >> + * to assign a unique SBDF to a passed through virtual PCI device. >> + */ >> + int vpci_dev_next; > In how far can the number stored here be negative? If it can't be, > please use unsigned int. Will use unsigned > > As to the comment - "current" is ambiguous: Is it the number that > was used last, or the next one to be used? I will update the comment to remove ambiguity > Jan > Thank you, Oleksandr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.