|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v13 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 2/13/24 03:35, Roger Pau Monné wrote:
> On Fri, Feb 02, 2024 at 04:33:05PM -0500, Stewart Hildebrand wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>
>> Use the per-domain PCI read/write lock to protect the presence of the
>> pci device vpci field. This lock can be used (and in a few cases is used
>> right away) so that vpci removal can be performed while holding the lock
>> in write mode. Previously such removal could race with vpci_read for
>> example.
>>
>> When taking both d->pci_lock and pdev->vpci->lock, they should be
>> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
>> possible deadlock situations.
>>
>> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
>> from being removed.
>>
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if
>> done under the read lock, requires vpci->lock to be acquired on both
>> devices being compared, which may produce a deadlock. It is not
>> possible to upgrade read lock to write lock in such a case. So, in
>> order to prevent the deadlock, use d->pci_lock in write mode instead.
>>
>> All other code, which doesn't lead to pdev->vpci destruction and does
>> not access multiple pdevs at the same time, can still use a
>> combination of the read lock and pdev->vpci->lock.
>>
>> 3. Drop const qualifier where the new rwlock is used and this is
>> appropriate.
>>
>> 4. Do not call process_pending_softirqs with any locks held. For that
>> unlock prior the call and re-acquire the locks after. After
>> re-acquiring the lock there is no need to check if pdev->vpci exists:
>> - in apply_map because of the context it is called (no race condition
>> possible)
>> - for MSI/MSI-X debug code because it is called at the end of
>> pdev->vpci access and no further access to pdev->vpci is made
>>
>> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
>> while accessing pdevs in vpci code.
>>
>> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
>> do not go away. The vPCI functions call several MSI-related functions
>> which already have existing non-vPCI callers. Change those MSI-related
>> functions to allow using either pcidevs_lock() or d->pci_lock for
>> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
>> sufficient. Note that this pdev protection mechanism does not protect
>> other state or critical sections. These MSI-related functions already
>> have other race condition and state protection mechanims (e.g.
>> d->event_lock and msixtbl RCU), so we deduce that the use of the global
>> pcidevs_lock() is to ensure that pdevs do not go away. Existing non-vPCI
>> callers of these MSI-related functions will remain (ab)using the global
>> pcidevs_lock() to ensure pdevs do not go away so as to minimize changes
>> to existing non-vPCI call paths.
>>
>> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
>> that pdevs do not go away. The purpose of this wrapper is to aid
>> readability and document the intent of the pdev protection mechanism.
>
> I would add that when possible, the existing callers haven't been
> switched to use the newly introduced per-domain pci_lock, and will
> continue to use the global pcidevs lock. This is done to reduce the
> risk of the new locking scheme introducing regressions. Those users
> will be adjusted in due time.
I'll use this wording, thanks
>
> IIRC Jan had concerns about why some existing use-cases are not
> switched straight to use the new per-domain pci_lock in this patch.
I hope the clarified commit description addresses this
>
>>
>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>> ---
>> Changes in v13:
>> - hold off adding Roger's R-b tag even though it was provided on v12.2
>> - use a wrapper construct to ease readability of odd-looking ASSERTs
>> - new placement of ASSERT in __pci_enable_msix(), __pci_enable_msi(),
>> and pci_enable_msi(). Rearrange/add pdev NULL check.
>> - expand commit description with details about using either
>> pcidevs_lock() or d->pci_lock
>>
>> Changes in v12.2:
>> - drop Roger's R-b
>> - drop both locks on error paths in vpci_msix_arch_print()
>> - add another ASSERT in vpci_msix_arch_print(), to enforce the
>> expectation both locks are held before calling vpci_msix_arch_print()
>> - move pdev_done label in vpci_dump_msi()
>> - update comments in vpci_dump_msi() to say locks (plural)
>>
>> Changes in v12.1:
>> - use read_trylock() in vpci_msix_arch_print()
>> - fixup in-code comments (revert double space, use DomXEN) in
>> vpci_{read,write}()
>> - minor updates in commit message
>> - add Roger's R-b
>>
>> Changes in v12:
>> - s/pci_rwlock/pci_lock/ in commit message
>> - expand comment about scope of pci_lock in sched.h
>> - in vpci_{read,write}, if hwdom is trying to access a device assigned
>> to dom_xen, holding hwdom->pci_lock is sufficient (no need to hold
>> dom_xen->pci_lock)
>> - reintroduce ASSERT in vmx_pi_update_irte()
>> - reintroduce ASSERT in __pci_enable_msi{x}()
>> - delete note 6. in commit message about removing ASSERTs since we have
>> reintroduced them
>>
>> Changes in v11:
>> - Fixed commit message regarding possible spinlocks
>> - Removed parameter from allocate_and_map_msi_pirq(), which was added
>> in the prev version. Now we are taking pcidevs_lock in
>> physdev_map_pirq()
>> - Returned ASSERT to pci_enable_msi
>> - Fixed case when we took read lock instead of write one
>> - Fixed label indentation
>>
>> Changes in v10:
>> - Moved printk pas locked area
>> - Returned back ASSERTs
>> - Added new parameter to allocate_and_map_msi_pirq() so it knows if
>> it should take the global pci lock
>> - Added comment about possible improvement in vpci_write
>> - Changed ASSERT(rw_is_locked()) to rw_is_write_locked() in
>> appropriate places
>> - Renamed release_domain_locks() to release_domain_write_locks()
>> - moved domain_done label in vpci_dump_msi() to correct place
>> Changes in v9:
>> - extended locked region to protect vpci_remove_device and
>> vpci_add_handlers() calls
>> - vpci_write() takes lock in the write mode to protect
>> potential call to modify_bars()
>> - renamed lock releasing function
>> - removed ASSERT()s from msi code
>> - added trylock in vpci_dump_msi
>>
>> Changes in v8:
>> - changed d->vpci_lock to d->pci_lock
>> - introducing d->pci_lock in a separate patch
>> - extended locked region in vpci_process_pending
>> - removed pcidevs_lockis vpci_dump_msi()
>> - removed some changes as they are not needed with
>> the new locking scheme
>> - added handling for hwdom && dom_xen case
>> ---
>> xen/arch/x86/hvm/vmsi.c | 31 +++++++++++++--------
>> xen/arch/x86/hvm/vmx/vmx.c | 2 +-
>> xen/arch/x86/irq.c | 8 +++---
>> xen/arch/x86/msi.c | 20 +++++++++-----
>> xen/arch/x86/physdev.c | 2 ++
>> xen/drivers/passthrough/pci.c | 9 +++---
>> xen/drivers/vpci/header.c | 18 ++++++++++++
>> xen/drivers/vpci/msi.c | 30 +++++++++++++++++---
>> xen/drivers/vpci/msix.c | 52 ++++++++++++++++++++++++++++++-----
>> xen/drivers/vpci/vpci.c | 24 ++++++++++++++--
>> xen/include/xen/sched.h | 15 +++++++++-
>> 11 files changed, 170 insertions(+), 41 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index 128f23636279..f29089178a59 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq
>> *pirq, uint64_t gtable)
>> struct msixtbl_entry *entry, *new_entry;
>> int r = -EINVAL;
>>
>> - ASSERT(pcidevs_locked());
>> + ASSERT(pdev_list_is_read_locked(d));
>> ASSERT(rw_is_write_locked(&d->event_lock));
>>
>> if ( !msixtbl_initialised(d) )
>> @@ -538,7 +538,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq
>> *pirq)
>> struct pci_dev *pdev;
>> struct msixtbl_entry *entry;
>>
>> - ASSERT(pcidevs_locked());
>> + ASSERT(pdev_list_is_read_locked(d));
>> ASSERT(rw_is_write_locked(&d->event_lock));
>>
>> if ( !msixtbl_initialised(d) )
>> @@ -684,7 +684,7 @@ static int vpci_msi_update(const struct pci_dev *pdev,
>> uint32_t data,
>> {
>> unsigned int i;
>>
>> - ASSERT(pcidevs_locked());
>> + ASSERT(rw_is_locked(&pdev->domain->pci_lock));
>
> Any reason to not use the newly introduced helper here? I know the
> pcidevs will never be locked here given the new lock usage, but still
> it would be less confusing if the new helper was used consistently.
Nope. Agreed, it would help readability, I'll switch to the helper. In
addition to vpci_msi_update(), I assume this comment also applies to the
remaining occurrences:
xen/arch/x86/hvm/vmsi.c:vpci_msi_arch_update()
xen/arch/x86/hvm/vmsi.c:vpci_msi_arch_enable()
xen/arch/x86/hvm/vmsi.c:vpci_msi_disable()
xen/arch/x86/hvm/vmsi.c:vpci_msix_arch_enable_entry()
xen/drivers/vpci/msix.c:msix_find()
But not:
xen/arch/x86/hvm/vmsi.c:vpci_msix_arch_print()
I'll add a suitable comment to this one exception.
>
> Otherwise we need a comment here as to why the helper can't be used,
> in order to avoid confusion in the future.
>
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 9da91e0e6244..c3adec1aca3c 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -462,7 +462,8 @@ struct domain
>> #ifdef CONFIG_HAS_PCI
>> struct list_head pdev_list;
>> /*
>> - * pci_lock protects access to pdev_list.
>> + * pci_lock protects access to pdev_list. pci_lock also protects
>> pdev->vpci
>> + * structure from being removed.
>> *
>> * Any user *reading* from pdev_list, or from devices stored in
>> pdev_list,
>> * should hold either pcidevs_lock() or pci_lock in read mode.
>> Optionally,
>> @@ -628,6 +629,18 @@ struct domain
>> unsigned int cdf;
>> };
>>
>> +/*
>> + * Check for use in ASSERTs to ensure that:
>> + * 1. we can *read* d->pdev_list
>> + * 2. pdevs (belonging to this domain) do not go away
>> + * 3. pdevs (belonging to this domain) do not get assigned to other
>> domains
>
> I think you can just state that this check ensures there will be no
> changes to the entries in d->pdev_list, but not the contents of each
> entry. No changes to d->pdev_list already ensures not devices can be
> deassigned or removed from the system, and obviously makes the list
> safe to iterate against.
OK, I'll simplify the comment
>
> I would also drop the explicitly mention this is intended for ASSERT
> usage: there's nothing specific in the code that prevents it from
> being used in other places (albeit I think that's unlikely).
>
>> + * This check is not suitable for protecting other state or critical
>> regions.
>> + */
>> +#define pdev_list_is_read_locked(d) ({ \
>
> I would be tempted to drop at least the '_read_' part from the name,
> the name is getting a bit too long for my taste.
>
>> + struct domain *d_ = (d); \
>
> Why do you need this local domain variable? Can't you use the d
> parameter directly?
>
> Such assign will prevent using a const 'd' parameter, and 'd_' itself
> should be const IMO (iff we really need this).
>
> Also sched.h is not the best place, can't you just place it in
> pci.h?
Yes, I'll move it to xen/include/xen/pci.h
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |