[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 14:48:15 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VQELOwE604JH34St9WSsVguW3L6k1ASB43q/zFgHawo=; b=GBLywyztdPMQrGzSBy8//xZ3poJ+8EVgebP645/VTRi59BHX06UZ7zrkr6a2a0pv5ryWkGSjC/WuSFTrnaRZLc6274DXmLBM1OX8BKzGoDMZbbTVDI/O457GcsgdKdyVFuA5Tq68NHAn0Gs0fL3erxPTpb67Ve8lto0I0sVGyKGsO+usml5LH/RJajXGjXohLzTzmarhBvxyvHOuoW/0k5HMOq2NgiBuwxyXwL05sSOMr0I/k4q6QOhUl7SSN/majymmL9K92NVAfqIr+Ib9/T3csedR8Lfat8ePy3ywpOC/C1AG6HecKd91oqvqpGbsT8Lq7p3VSKYUs2QHTWEiJg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AWWLu25iB15bE2sdGy1YpSyvsUfuOqx3OKr4QuZIvtbYQwc5+oAxWtwBYgGuxIcsSGFqlmulmqm15DQ6SNW3AR7gCyIbE9yoJwcBTsTwE0XvUxTOuc9nIjRoBAJwtDSE0Y4GAuAhLKo2nEwnOqYzA7K6BRzqNSr/BEX8uG/CXo2fmi4S3pKB8kqjKLw72rTHVKvEgiJMusEPwmsgPqkW4gcuJ7mfNQ/D1F9TheXIKouhvw9/VY3CU2aYAk5ezgOlCqOG6be6duhzlKw6RsSLpL10UtJPyRCO2vBHJ9ILZFshdSYGZmIBh7uPP8pYjgTkzNmfLoqmx5DU99MaUyEiaw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 14 Feb 2022 13:48:28 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.02.2022 14:27, Oleksandr Andrushchenko wrote:
> 
> 
> On 14.02.22 15:22, Jan Beulich wrote:
>> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>>>
>>> On 14.02.22 14:57, Jan Beulich wrote:
>>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko 
>>>>>>>> wrote:
>>>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko 
>>>>>>>>>> wrote:
>>>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>>>>> +
>>>>>>>>>>>>>>>             for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>                 const struct vpci_msix_entry *entry = 
>>>>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>>>>>> locked it's likely not appropriate to call 
>>>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>>>>
>>>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>>>>
>>>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>>>>
>>>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>>>>> in between or *re-created*
>>>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs 
>>>>>>>>>>>>>> assigned to
>>>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>>>>>> domain.
>>>>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>>>>
>>>>>>>>>>>>> read_unlock
>>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>>> read_lock
>>>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>>>>> <continue processing>
>>>>>>>>>>>> Something along those lines. You likely need to continue iterate 
>>>>>>>>>>>> using
>>>>>>>>>>>> for_each_pdev.
>>>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>>>>> this question before [1] and I was about to use some ID for that 
>>>>>>>>>>> purpose:
>>>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>>>>>>>>>>> checks
>>>>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>>>>> and that the resume MSI entry is not past the current limit. 
>>>>>>>>>> Otherwise
>>>>>>>>>> just print a message and move on to the next device.
>>>>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>>>>
>>>>>>>>> How about this one:
>>>>>>>>>
>>>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>>>>>>>>> const struct pci_dev *pdev,
>>>>>>>>>                                   struct rangeset *mem, uint16_t cmd)
>>>>>>>>>       {
>>>>>>>>>           struct map_data data = { .d = d, .map = true };
>>>>>>>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>>>>           int rc;
>>>>>>>>>
>>>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>>>>> +
>>>>>>>>>           while ( (rc = rangeset_consume_ranges(mem, map_range, 
>>>>>>>>> &data)) == -ERESTART )
>>>>>>>>> +    {
>>>>>>>>> +
>>>>>>>>> +        /*
>>>>>>>>> +         * process_pending_softirqs may trigger vpci_process_pending 
>>>>>>>>> which
>>>>>>>>> +         * may need to acquire pdev->domain->vpci_rwlock in read 
>>>>>>>>> mode.
>>>>>>>>> +         */
>>>>>>>>> +        write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>>>               process_pending_softirqs();
>>>>>>>>> +        write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>>> +
>>>>>>>>> +        /* Check if pdev still exists and vPCI was not removed or 
>>>>>>>>> re-created. */
>>>>>>>>> +        if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, 
>>>>>>>>> sbdf.devfn) != pdev)
>>>>>>>>> +            if ( vpci is NOT the same )
>>>>>>>>> +            {
>>>>>>>>> +                rc = 0;
>>>>>>>>> +                break;
>>>>>>>>> +            }
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>           rangeset_destroy(mem);
>>>>>>>>>           if ( !rc )
>>>>>>>>>               modify_decoding(pdev, cmd, false);
>>>>>>>>>
>>>>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>>>>> it has something to do yet). So, I think we just need:
>>>>>>>>>
>>>>>>>>>              write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>>>              process_pending_softirqs();
>>>>>>>>>              write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>>>
>>>>>>>>> and this should be enough
>>>>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>>>>> explicitly note that the context of the function is special, and thus
>>>>>>>> there's no possibility of either the device or vpci going away.
>>>>>>> Does it really need write_unlock/write_lock given the context?...
>>>>>> I think it's bad practice to call process_pending_softirqs while
>>>>>> holding any locks. This is a very specific context so it's likely fine
>>>>>> to not drop the lock, but would still seem incorrect to me.
>>>>> Ok
>>>>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>> Indeed, there's no chance of that because process_pending_softirqs
>>>>>> will never try to do a scheduling operation that would result in our
>>>>>> context being scheduled out.
>>>>>        while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>>>>> -ERESTART )
>>>>>        {
>>>>>            /*
>>>>>             * FIXME: Given the context apply_map is called from (dom0 
>>>>> specific
>>>>>             * init code at system_state < SYS_STATE_active) it is not 
>>>>> strictly
>>>>>             * required that pdev->domain->vpci_rwlock is unlocked before 
>>>>> calling
>>>>>             * process_pending_softirqs as there is no contention possible 
>>>>> between
>>>>>             * this code and vpci_process_pending trying to acquire the 
>>>>> lock in
>>>>>             * read mode. But running process_pending_softirqs with any 
>>>>> lock held
>>>>>             * doesn't seem to be a good practice, so drop the lock and 
>>>>> re-acquire
>>>>>             * it right again.
>>>>>             */
>>>>>            write_unlock(&pdev->domain->vpci_rwlock);
>>>>>            process_pending_softirqs();
>>>>>            write_lock(&pdev->domain->vpci_rwlock);
>>>>>        }
>>>> I'm afraid that's misleading at best. apply_map() is merely a specific
>>>> example where you know the lock is going to be taken. But really any
>>>> softirq handler could be acquiring any lock, so requesting to process
>>>> softirqs cannot ever be done with any lock held.
>>>>
>>>> What you instead want to explain is why, after re-acquiring the lock,
>>>> no further checking is needed for potentially changed state.
>>> How about:
>>>
>>> /*
>>>    * FIXME: Given the context apply_map is called from (dom0 specific
>>>    * init code at system_state < SYS_STATE_active) there is no contention
>>>    * possible between this code and vpci_process_pending trying to acquire
>>>    * the lock in read mode and destroy pdev->vpci in its error path.
>>>    * Neither pdev may be disposed yet, so it is not required to check if the
>>>    * relevant pdev still exists after re-acquiring the lock.
>>>    */
>> I'm not sure I follow the first sentence; I guess a comma or two may help,
>> and or using "as well as" in place of one of the two "and". I also don't
>> think you mean contention, but rather a race between the named entities?
>   /*
>    * FIXME: Given the context from which apply_map is called (dom0 specific
>    * init code at system_state < SYS_STATE_active) there is no race condition
>    * possible between this code and vpci_process_pending which may try to 
> acquire
>    * the lock in read mode and also try to destroy pdev->vpci in its error 
> path.
>    * Neither pdev may be disposed yet, so it is not required to check if the
>    * relevant pdev still exists after re-acquiring the lock.
>    */

I'm still struggling with the language, sorry. You look to only have replaced
"contention"? Reading it again I'd also like to mention that to me (not a
native speaker) "Neither pdev may be ..." expresses "None of the pdev-s may
be ...", when I think you mean "Nor may pdev be ..."

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.