|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure
Hi Roger
Thank you for the review.
Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
> On Thu, Oct 12, 2023 at 10:09:15PM +0000, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>
>> Use a previously introduced per-domain read/write lock to check
>> whether vpci is present, so we are sure there are no accesses to the
>> contents of the vpci struct if not. This lock can be used (and in a
>> few cases is used right away) so that vpci removal can be performed
>> while holding the lock in write mode. Previously such removal could
>> race with vpci_read for example.
>>
>> When taking both d->pci_lock and pdev->vpci->lock, they should be
>> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
>> possible deadlock situations.
>>
>> 1. Per-domain's pci_rwlock is used to protect pdev->vpci structure
>> from being removed.
>>
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if
>> done under the read lock, requires vpci->lock to be acquired on both
>> devices being compared, which may produce a deadlock. It is not
>> possible to upgrade read lock to write lock in such a case. So, in
>> order to prevent the deadlock, use d->pci_lock instead. To prevent
>> deadlock while locking both hwdom->pci_lock and dom_xen->pci_lock,
>> always lock hwdom first.
>>
>> All other code, which doesn't lead to pdev->vpci destruction and does
>> not access multiple pdevs at the same time, can still use a
>> combination of the read lock and pdev->vpci->lock.
>>
>> 3. Drop const qualifier where the new rwlock is used and this is
>> appropriate.
>>
>> 4. Do not call process_pending_softirqs with any locks held. For that
>> unlock prior the call and re-acquire the locks after. After
>> re-acquiring the lock there is no need to check if pdev->vpci exists:
>> - in apply_map because of the context it is called (no race condition
>> possible)
>> - for MSI/MSI-X debug code because it is called at the end of
>> pdev->vpci access and no further access to pdev->vpci is made
>>
>> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev_by_domain
>> while accessing pdevs in vpci code.
>>
>> 6. We are removing multiple ASSERT(pcidevs_locked()) instances because
>> they are too strict now: they should be corrected to
>> ASSERT(pcidevs_locked() || rw_is_locked(&d->pci_lock)), but problem is
>> that mentioned instances does not have access to the domain
>> pointer and it is not feasible to pass a domain pointer to a function
>> just for debugging purposes.
>>
>> There is a possible lock inversion in MSI code, as some parts of it
>> acquire pcidevs_lock() while already holding d->pci_lock.
>
> Is this going to be addressed in a further patch?
>
It is actually addressed in this patch, in the v10. I just forgot to
remove this sentence from the patch description. My bad.
This was fixed by additional parameter to allocate_and_map_msi_pirq(),
as it is being called from two places: from vpci_msi_enable(), while we
already are holding d->pci_lock, or from physdev_map_pirq(), when there
are no locks are taken.
[...]
>> @@ -2908,7 +2908,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int
>> index, int *pirq_p,
>>
>> msi->irq = irq;
>>
>> - pcidevs_lock();
>> + if ( use_pci_lock )
>> + pcidevs_lock();
>
> Instead of passing the flag it might be better if the caller can take
> the lock, as to avoid having to pass an extra parameter.
>
> Then we should also assert that either the pcidev_lock or the
> per-domain pci lock is taken?
>
This is a good idea. I'll add lock in physdev_map_pirq(). This is only
one place where we call physdev_map_pirq() without holding any lock.
>> /* Verify or get pirq. */
>> write_lock(&d->event_lock);
>> pirq = allocate_pirq(d, index, *pirq_p, irq, type, &msi->entry_nr);
>> @@ -2924,7 +2925,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int
>> index, int *pirq_p,
>>
>> done:
>> write_unlock(&d->event_lock);
>> - pcidevs_unlock();
>> + if ( use_pci_lock )
>> + pcidevs_unlock();
>> if ( ret )
>> {
>> switch ( type )
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 20275260b3..466725d8ca 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -602,7 +602,7 @@ static int msi_capability_init(struct pci_dev *dev,
>> unsigned int i, mpos;
>> uint16_t control;
>>
>> - ASSERT(pcidevs_locked());
>> + ASSERT(pcidevs_locked() || rw_is_locked(&dev->domain->pci_lock));
>> pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
>> if ( !pos )
>> return -ENODEV;
>> @@ -771,7 +771,7 @@ static int msix_capability_init(struct pci_dev *dev,
>> if ( !pos )
>> return -ENODEV;
>>
>> - ASSERT(pcidevs_locked());
>> + ASSERT(pcidevs_locked() || rw_is_locked(&dev->domain->pci_lock));
>>
>> control = pci_conf_read16(dev->sbdf, msix_control_reg(pos));
>> /*
>> @@ -988,8 +988,6 @@ static int __pci_enable_msi(struct msi_info *msi, struct
>> msi_desc **desc,
>> {
>> struct msi_desc *old_desc;
>>
>> - ASSERT(pcidevs_locked());
>> -
>> if ( !pdev )
>> return -ENODEV;
>>
>> @@ -1043,8 +1041,6 @@ static int __pci_enable_msix(struct msi_info *msi,
>> struct msi_desc **desc,
>> {
>> struct msi_desc *old_desc;
>>
>> - ASSERT(pcidevs_locked());
>> -
>> if ( !pdev || !pdev->msix )
>> return -ENODEV;
>>
>> @@ -1154,8 +1150,6 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool
>> off)
>> int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
>> struct pci_dev *pdev)
>> {
>> - ASSERT(pcidevs_locked());
>> -
>
> If you have the pdev in all the above function, you could expand the
> assert to test for pdev->domain->pci_lock?
>
Yes, you are right. This is the leftover from times where
pci_enable_msi() acquired pdev pointer by itself.
[...]
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 767c1ba718..a52e52db96 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -172,6 +172,7 @@ bool vpci_process_pending(struct vcpu *v)
>> if ( rc == -ERESTART )
>> return true;
>>
>> + write_lock(&v->domain->pci_lock);
>> spin_lock(&v->vpci.pdev->vpci->lock);
>> /* Disable memory decoding unconditionally on failure. */
>> modify_decoding(v->vpci.pdev,
>> @@ -190,6 +191,7 @@ bool vpci_process_pending(struct vcpu *v)
>> * failure.
>> */
>> vpci_remove_device(v->vpci.pdev);
>> + write_unlock(&v->domain->pci_lock);
>> }
>>
>> return false;
>> @@ -201,8 +203,20 @@ static int __init apply_map(struct domain *d, const
>> struct pci_dev *pdev,
>> struct map_data data = { .d = d, .map = true };
>> int rc;
>>
>> + ASSERT(rw_is_write_locked(&d->pci_lock));
>> +
>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) ==
>> -ERESTART )
>> + {
>> + /*
>> + * It's safe to drop and reacquire the lock in this context
>> + * without risking pdev disappearing because devices cannot be
>> + * removed until the initial domain has been started.
>> + */
>> + read_unlock(&d->pci_lock);
>> process_pending_softirqs();
>> + read_lock(&d->pci_lock);
>
> You are asserting the lock is taken in write mode just above the usage
> of read_{un,}lock(). Either the assert is wrong, or the usage of
> read_{un,}lock() is wrong.
Oops, looks like this is the rebasing artifact. The final version of the
code uses write locks of course. Patch "vpci/header: handle p2m range
sets per BAR" changes this piece of code too and replaces read lock
with the write lock. I'll move that change to this patch.
[...]
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 3bec9a4153..112de56fb3 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -38,6 +38,8 @@ extern vpci_register_init_t *const __end_vpci_array[];
>>
>> void vpci_remove_device(struct pci_dev *pdev)
>> {
>> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>> if ( !has_vpci(pdev->domain) || !pdev->vpci )
>> return;
>>
>> @@ -73,6 +75,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
>> const unsigned long *ro_map;
>> int rc = 0;
>>
>> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>> if ( !has_vpci(pdev->domain) )
>> return 0;
>>
>> @@ -326,11 +330,12 @@ static uint32_t merge_result(uint32_t data, uint32_t
>> new, unsigned int size,
>>
>> uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>> {
>> - const struct domain *d = current->domain;
>> + struct domain *d = current->domain;
>> const struct pci_dev *pdev;
>> const struct vpci_register *r;
>> unsigned int data_offset = 0;
>> uint32_t data = ~(uint32_t)0;
>> + rwlock_t *lock;
>>
>> if ( !size )
>> {
>> @@ -342,11 +347,21 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
>> unsigned int size)
>> * Find the PCI dev matching the address, which for hwdom also requires
>> * consulting DomXEN. Passthrough everything that's not trapped.
>> */
>> + lock = &d->pci_lock;
>> + read_lock(lock);
>> pdev = pci_get_pdev(d, sbdf);
>> if ( !pdev && is_hardware_domain(d) )
>> + {
>> + read_unlock(lock);
>> + lock = &dom_xen->pci_lock;
>> + read_lock(lock);
>> pdev = pci_get_pdev(dom_xen, sbdf);
>
> I'm unsure whether devices assigned to dom_xen can change ownership
> after boot, so maybe there's no need for all this lock dance, as the
> device cannot disappear?
>
> Maybe just taking the hardware domain lock is enough to prevent
> concurrent accesses in that case, as the hardware domain is the only
> allowed to access devices owned by dom_xen.
This sounds correct. If there are no objections then I can remove extra
locks.
--
WBR, Volodymyr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |