[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Wed, 3 Nov 2021 08:57:59 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=abJhIhsTTkiUCMlNEdTW5hx8Z4rICWMfvn5gKhIQIe8=; b=Kkowx4Xx/mUGdG9Si1OTP4zFWbyD2EfXItePlv23Nu2Tl6Dyt4wAcSIFy5mStzlYsVR1LASGDIRZO/g6T6FuuWlB/z+LmhxiQ44jQysH30cPIOuSvtJ0OJBQ+YHQiFHhSUedA476Ob7rUFa2Z/XP9QQAvyFL3V0AnFTmQhz9euMMKcYqR2AQMhG/rBSZvGIPbI001vRNu8zjUxQVn6kW2NM7vB4F/6iMe3yMVf3Ag0jZSKPu20CJwzWV6zkBrFtfmqjhGr9+HDHhglSrws9x5Xr3FsPbSaI74CsP7TSKIIHegEDNnapba2cRaXjJfFPCfBzobPXGSgl7482JcSTvHQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Rvf7s76vtWyDzSuAU+fhk2wb6HLalrEI+0g2RiIDfdW2hCSMnTIlsMrZmrJwzHjRYysT7z03+mZ6X0OpKB99GhY3UEPyJRpiybJV1Auj4cKmPYRCE4cdb0FCWfwuPBqPVSaIipiqOQAib7uN5Wf03keGDrNjiAiqZSe+DO7BuDblj31Es+AP5W72FGuozkiykepEz+IQN/9Z9UhJ8w9bF6yYjJPjtkLnWxSdus4Kfm3PzaYu39v0pDbmju3jMpR7zYPOng7y65JVMXC3zUQeLc5C9hKGYaoeQ68qTBXGa34k71QEnfEIxLMbcOnhV84z0OA/q8SnW7eVzGZqaV0XNA==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 03 Nov 2021 08:58:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXtdAlOCu2AnkIOUC4Prsocox3sqvlTvCAgAw/FQCAACN9gIAABKkA
  • Thread-topic: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology


On 03.11.21 10:41, Jan Beulich wrote:
> On 03.11.2021 07:34, Oleksandr Andrushchenko wrote:
>> Hi, Roger
>>
>> On 26.10.21 14:33, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>
>>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>>> root complex itself (embedded endpoints).
>>>> This implementation is limited to 32 devices which are allowed on
>>>> a single PCI bus.
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>
>>>> ---
>>>> Since v2:
>>>>    - remove casts that are (a) malformed and (b) unnecessary
>>>>    - add new line for better readability
>>>>    - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>>>       functions are now completely gated with this config
>>>>    - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> New in v2
>>>> ---
>>>>    xen/common/domain.c           |  3 ++
>>>>    xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>>>    xen/drivers/vpci/vpci.c       | 14 +++++++-
>>>>    xen/include/xen/pci.h         | 22 +++++++++++++
>>>>    xen/include/xen/sched.h       |  8 +++++
>>>>    5 files changed, 106 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>> index 40d67ec34232..e0170087612d 100644
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>>>    
>>>>    #ifdef CONFIG_HAS_PCI
>>>>        INIT_LIST_HEAD(&d->pdev_list);
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +    INIT_LIST_HEAD(&d->vdev_list);
>>>> +#endif
>>>>    #endif
>>>>    
>>>>        /* All error paths can depend on the above setup. */
>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>>> index 805ab86ed555..5b963d75d1ba 100644
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>>        return ret;
>>>>    }
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>>> +                                                const struct pci_dev 
>>>> *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>> +        if ( vdev->pdev == pdev )
>>>> +            return vdev;
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>>> +
>>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>>> +    if ( d->vpci_dev_next > 31 )
>>>> +        return -ENOSPC;
>>>> +
>>>> +    vdev = xzalloc(struct vpci_dev);
>>>> +    if ( !vdev )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    /* We emulate a single host bridge for the guest, so segment is 
>>>> always 0. */
>>>> +    vdev->seg = 0;
>>>> +
>>>> +    /*
>>>> +     * The bus number is set to 0, so virtual devices are seen
>>>> +     * as embedded endpoints behind the root complex.
>>>> +     */
>>>> +    vdev->bus = 0;
>>>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>>> This would likely be better as a bitmap where you set the bits of
>>> in-use slots. Then you can use find_first_bit or similar to get a free
>>> slot.
>>>
>>> Long term you might want to allow the caller to provide a pre-selected
>>> slot, as it's possible for users to request the device to appear at a
>>> specific slot on the emulated bus.
>>>
>>>> +
>>>> +    vdev->pdev = pdev;
>>>> +    vdev->domain = d;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    list_add_tail(&vdev->list, &d->vdev_list);
>>>> +    pcidevs_unlock();
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev 
>>>> *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    vdev = pci_find_virtual_device(d, pdev);
>>>> +    if ( vdev )
>>>> +        list_del(&vdev->list);
>>>> +    pcidevs_unlock();
>>>> +    xfree(vdev);
>>>> +    return 0;
>>>> +}
>>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> +
>>>>    /* Caller should hold the pcidevs_lock */
>>>>    static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>>>                               uint8_t devfn)
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 702f7b5d5dda..d787f13e679e 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev 
>>>> *pdev)
>>>>    /* Notify vPCI that device is assigned to guest. */
>>>>    int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>>    {
>>>> +    int rc;
>>>> +
>>>>        /* It only makes sense to assign for hwdom or guest domain. */
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> -    return vpci_bar_add_handlers(d, dev);
>>>> +    rc = vpci_bar_add_handlers(d, dev);
>>>> +    if ( rc )
>>>> +        return rc;
>>>> +
>>>> +    return pci_add_virtual_device(d, dev);
>>>>    }
>>>>    
>>>>    /* Notify vPCI that device is de-assigned from guest. */
>>>>    int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>>    {
>>>> +    int rc;
>>>> +
>>>>        /* It only makes sense to de-assign from hwdom or guest domain. */
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> +    rc = pci_remove_virtual_device(d, dev);
>>>> +    if ( rc )
>>>> +        return rc;
>>>> +
>>>>        return vpci_bar_remove_handlers(d, dev);
>>>>    }
>>>>    #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>>> index 43b8a0817076..33033a3a8f8d 100644
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>>        struct vpci *vpci;
>>>>    };
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +struct vpci_dev {
>>>> +    struct list_head list;
>>>> +    /* Physical PCI device this virtual device is connected to. */
>>>> +    const struct pci_dev *pdev;
>>>> +    /* Virtual SBDF of the device. */
>>>> +    union {
>>>> +        struct {
>>>> +            uint8_t devfn;
>>>> +            uint8_t bus;
>>>> +            uint16_t seg;
>>>> +        };
>>>> +        pci_sbdf_t sbdf;
>>>> +    };
>>>> +    struct domain *domain;
>>>> +};
>>>> +#endif
>>> I wonder whether this is strictly needed. Won't it be enough to store
>>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>>
>>> It would avoid the overhead of the translation you do from pdev ->
>>> vdev, and there doesn't seem to be anything relevant stored in
>>> vpci_dev apart from the virtual sbdf.
>> TL;DR It seems it might be needed from performance POV. If not implemented
>> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
>> Note: pcidevs' lock is a recursive lock
>>
>> There are 2 sources of access to virtual devices:
>> 1. During initialization when we add, assign or de-assign a PCI device
>> 2. At run-time when we trap configuration space access and need to
>> translate virtual SBDF into physical SBDF
>> 3. At least de-assign can run concurrently with MMIO handlers
>>
>> Now let's see which locks are in use while doing that.
>>
>> 1. No struct vpci_dev is used.
>> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you 
>> suggest
>> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
>> 1.3. Locking happens on system level
>>
>> 2. struct vpci_dev is used
>> 2.1. We have a per-domain lock vdev_lock
>> 2.2. Locking happens on per domain level
>>
>> To compare the two:
>>
>> 1. Without vpci_dev
>> pros: much simpler code
>> pros/cons: global lock is used during MMIO handling, but it is a recursive 
>> lock
> Could you point out to me in which way the recursive nature of the lock
> is relevant here? Afaict that aspect is of no interest when considering
> the performance effects of using a global lock vs one with more narrow
> scope.
I just tried to find some excuses and defend pcidev's global lock,
so even lock's recursion could be an argument here. Weak.
Besides that I do agree that this is still a global lock.
>
> Jan
>

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.