[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 13:27:56 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ubQF1YpH9OniJe0kqJ2IukTE0bZys3ZXuX5Sqx17TbA=; b=Q/THBrbvg5YEjMGUBjO8eJDBwBBmE8qVgrx2teEn/7/PP13Bjhq7CUVCdkJnPZS5tg1+8Ero3ESsgwhEMgowZhwvddQ6qbNCyRSqHbYN3loY22sW66CCj2d6gWZ6qk5ytpjny9UeXrCRiO0MkVEmaP7lBcRMItIxbV8/hddW7Yoa8q5RvogBj9UDUbocG6reBoMwYkw2VC5Zk9ptBKp1s9HNQ4CcTeqrHnrRvnha5RJoxxPafzalFX5sDSYGcjf/nPXCSenprrwyECLwLNxvhEGwQSW9gUhqbyCPCEgZ5+PPok+ekYK8vjUmZN023d5xttQ3iKt0Vpk46j7+9ww5yQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kjAtXN8Hk/DTTq09w8NoMykrOFjFMga0Z/7CqYagJ56V6oz9r3yoqXt1AuMgf+7lDlDk+5wW7XAEB/LlXTUPs2AROGuYBvhhgJjuReoggVECt1JEXDk4+FMOWuYr+cCJaiM2t6UAcxLwNWhnwZJ6PC1wXZvF0Xy0CbVWEb7l/WrZ5y/3dwXYUG68nI6+goy2E0TWCkISdAg6DOTuaKZWNkbUIinUXJEZaKLpsXVgA0HtppJXpdGOEAyfSM/Z9A4/A5EIsWTUxL+BCRA2ruvoesM78mJSfNLhGAEUpcGYAtF3btDAVfTIXxkjpPdfGtk8/3CEQ3+A+wxN5QKF8TtroA==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Mon, 14 Feb 2022 13:28:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYHboQT3cBWYI1/EGunE7hOwop6ayM95MAgAD+qQCAAEaDAIAElIcAgAAQEwCAAAV1AIAABOQAgAABMICAAALZgIAAAzmAgAAWXYCAAARtAIAAAoIAgAABpIA=
  • Thread-topic: [PATCH] vpci: introduce per-domain lock to protect vpci structure


On 14.02.22 15:22, Jan Beulich wrote:
> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 14:57, Jan Beulich wrote:
>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko 
>>>>>>>>> wrote:
>>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>>>> +
>>>>>>>>>>>>>>             for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>                 const struct vpci_msix_entry *entry = 
>>>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>>>>> locked it's likely not appropriate to call 
>>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>>>
>>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>>>
>>>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>>>
>>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>>>
>>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>>>
>>>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>>>> in between or *re-created*
>>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>>>>> domain.
>>>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>>>
>>>>>>>>>>>> read_unlock
>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>> read_lock
>>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>>>> <continue processing>
>>>>>>>>>>> Something along those lines. You likely need to continue iterate 
>>>>>>>>>>> using
>>>>>>>>>>> for_each_pdev.
>>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>>>> this question before [1] and I was about to use some ID for that 
>>>>>>>>>> purpose:
>>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>>>>>>>>>> checks
>>>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise
>>>>>>>>> just print a message and move on to the next device.
>>>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>>>
>>>>>>>> How about this one:
>>>>>>>>
>>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>>>>>>>> const struct pci_dev *pdev,
>>>>>>>>                                   struct rangeset *mem, uint16_t cmd)
>>>>>>>>       {
>>>>>>>>           struct map_data data = { .d = d, .map = true };
>>>>>>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>>>           int rc;
>>>>>>>>
>>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>>>> +
>>>>>>>>           while ( (rc = rangeset_consume_ranges(mem, map_range, 
>>>>>>>> &data)) == -ERESTART )
>>>>>>>> +    {
>>>>>>>> +
>>>>>>>> +        /*
>>>>>>>> +         * process_pending_softirqs may trigger vpci_process_pending 
>>>>>>>> which
>>>>>>>> +         * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>>>>>>> +         */
>>>>>>>> +        write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>>               process_pending_softirqs();
>>>>>>>> +        write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>> +
>>>>>>>> +        /* Check if pdev still exists and vPCI was not removed or 
>>>>>>>> re-created. */
>>>>>>>> +        if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) 
>>>>>>>> != pdev)
>>>>>>>> +            if ( vpci is NOT the same )
>>>>>>>> +            {
>>>>>>>> +                rc = 0;
>>>>>>>> +                break;
>>>>>>>> +            }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>           rangeset_destroy(mem);
>>>>>>>>           if ( !rc )
>>>>>>>>               modify_decoding(pdev, cmd, false);
>>>>>>>>
>>>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>>>> it has something to do yet). So, I think we just need:
>>>>>>>>
>>>>>>>>              write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>>              process_pending_softirqs();
>>>>>>>>              write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>>
>>>>>>>> and this should be enough
>>>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>>>> explicitly note that the context of the function is special, and thus
>>>>>>> there's no possibility of either the device or vpci going away.
>>>>>> Does it really need write_unlock/write_lock given the context?...
>>>>> I think it's bad practice to call process_pending_softirqs while
>>>>> holding any locks. This is a very specific context so it's likely fine
>>>>> to not drop the lock, but would still seem incorrect to me.
>>>> Ok
>>>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>> Indeed, there's no chance of that because process_pending_softirqs
>>>>> will never try to do a scheduling operation that would result in our
>>>>> context being scheduled out.
>>>>        while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>>>> -ERESTART )
>>>>        {
>>>>            /*
>>>>             * FIXME: Given the context apply_map is called from (dom0 
>>>> specific
>>>>             * init code at system_state < SYS_STATE_active) it is not 
>>>> strictly
>>>>             * required that pdev->domain->vpci_rwlock is unlocked before 
>>>> calling
>>>>             * process_pending_softirqs as there is no contention possible 
>>>> between
>>>>             * this code and vpci_process_pending trying to acquire the 
>>>> lock in
>>>>             * read mode. But running process_pending_softirqs with any 
>>>> lock held
>>>>             * doesn't seem to be a good practice, so drop the lock and 
>>>> re-acquire
>>>>             * it right again.
>>>>             */
>>>>            write_unlock(&pdev->domain->vpci_rwlock);
>>>>            process_pending_softirqs();
>>>>            write_lock(&pdev->domain->vpci_rwlock);
>>>>        }
>>> I'm afraid that's misleading at best. apply_map() is merely a specific
>>> example where you know the lock is going to be taken. But really any
>>> softirq handler could be acquiring any lock, so requesting to process
>>> softirqs cannot ever be done with any lock held.
>>>
>>> What you instead want to explain is why, after re-acquiring the lock,
>>> no further checking is needed for potentially changed state.
>> How about:
>>
>> /*
>>    * FIXME: Given the context apply_map is called from (dom0 specific
>>    * init code at system_state < SYS_STATE_active) there is no contention
>>    * possible between this code and vpci_process_pending trying to acquire
>>    * the lock in read mode and destroy pdev->vpci in its error path.
>>    * Neither pdev may be disposed yet, so it is not required to check if the
>>    * relevant pdev still exists after re-acquiring the lock.
>>    */
> I'm not sure I follow the first sentence; I guess a comma or two may help,
> and or using "as well as" in place of one of the two "and". I also don't
> think you mean contention, but rather a race between the named entities?
  /*
   * FIXME: Given the context from which apply_map is called (dom0 specific
   * init code at system_state < SYS_STATE_active) there is no race condition
   * possible between this code and vpci_process_pending which may try to 
acquire
   * the lock in read mode and also try to destroy pdev->vpci in its error path.
   * Neither pdev may be disposed yet, so it is not required to check if the
   * relevant pdev still exists after re-acquiring the lock.
   */
>
> Jan
>
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.