[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 14:22:02 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8zydRb0P0vZ2SfBvuiDPx/mbmRGQUZ1KeivECWm8RR8=; b=WbTr6F/auIDgNdSFlgka/kkPN8uEnAnbpn6rsXNPXrCX8A0HMZ42XUIvfuYCZJppUY/8WomYdIsounjWIAqz9F+H3itDwh5QAjsSriKo7qxmzQvOgXUfkIiyMMojimYbduHyT/BPWIFGAvTbTX+OCflZ/t2mtSbFAb1NCuXyLSyT9C6RMo4UECgfnHdGivJTiHVWVacMkqK9c6h4cKTa8EwRFeExLXZqdVvwFofzeCmg/M38hDhsrQo9DB0z9apQtLh8fKnmvnX1XgPkfZ03m4EutXM98qUGFkIO4ysh0vK30KqImVIievdpn43MF2gYHCHCI6hVgMVUInqxAPwDMQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=K6vuYvZw2IodrHbt09+4OS+lpEsRiw0FV2WgKuAuQuJuLbIv+Cj0fgl40A/XH+nyHYLC2R7DUOLI3Rng3XIwg1Dvdc3oAit7towg9wZxLMHhSSh9t4VFTxZUvFNFPqfBBvppGPcg1AH7pP02EHYR7mFlL0gDtWS0QBTPaZb4r1RUiGVtqfQ82XUGFaGm8esfDIgWFLLIrhedth7jWdRXxBZaMY62XHGkiYo4DoIGXDI/Iw4W1Z2UD2pUjxy+aYYrnlIA7hWslcBC56sy3eqOXgX279nYtQCJ3VhVEYmL0bPya75QuZyPoXo485xpZDgl6U5sfPRMLyNisq0pqc6ZBA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 14 Feb 2022 13:22:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
> 
> 
> On 14.02.22 14:57, Jan Beulich wrote:
>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>
>>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko 
>>>>>>>> wrote:
>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>>> +
>>>>>>>>>>>>>            for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>>            {
>>>>>>>>>>>>>                const struct vpci_msix_entry *entry = 
>>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>>>> locked it's likely not appropriate to call process_pending_softirqs
>>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>>
>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>>
>>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>>
>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>>
>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>>
>>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>>> in between or *re-created*
>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned 
>>>>>>>>>>>> to
>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>>>> domain.
>>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>>
>>>>>>>>>>> read_unlock
>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>> read_lock
>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>>> <continue processing>
>>>>>>>>>> Something along those lines. You likely need to continue iterate 
>>>>>>>>>> using
>>>>>>>>>> for_each_pdev.
>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>>> this question before [1] and I was about to use some ID for that 
>>>>>>>>> purpose:
>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>>>>>>>>> checks
>>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise
>>>>>>>> just print a message and move on to the next device.
>>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>>
>>>>>>> How about this one:
>>>>>>>
>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>>>>>>> const struct pci_dev *pdev,
>>>>>>>                                  struct rangeset *mem, uint16_t cmd)
>>>>>>>      {
>>>>>>>          struct map_data data = { .d = d, .map = true };
>>>>>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>>          int rc;
>>>>>>>
>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>>> +
>>>>>>>          while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) 
>>>>>>> == -ERESTART )
>>>>>>> +    {
>>>>>>> +
>>>>>>> +        /*
>>>>>>> +         * process_pending_softirqs may trigger vpci_process_pending 
>>>>>>> which
>>>>>>> +         * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>>>>>> +         */
>>>>>>> +        write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>              process_pending_softirqs();
>>>>>>> +        write_lock(&pdev->domain->vpci_rwlock);
>>>>>>> +
>>>>>>> +        /* Check if pdev still exists and vPCI was not removed or 
>>>>>>> re-created. */
>>>>>>> +        if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) 
>>>>>>> != pdev)
>>>>>>> +            if ( vpci is NOT the same )
>>>>>>> +            {
>>>>>>> +                rc = 0;
>>>>>>> +                break;
>>>>>>> +            }
>>>>>>> +    }
>>>>>>> +
>>>>>>>          rangeset_destroy(mem);
>>>>>>>          if ( !rc )
>>>>>>>              modify_decoding(pdev, cmd, false);
>>>>>>>
>>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>>> it has something to do yet). So, I think we just need:
>>>>>>>
>>>>>>>             write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>             process_pending_softirqs();
>>>>>>>             write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>
>>>>>>> and this should be enough
>>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>>> explicitly note that the context of the function is special, and thus
>>>>>> there's no possibility of either the device or vpci going away.
>>>>> Does it really need write_unlock/write_lock given the context?...
>>>> I think it's bad practice to call process_pending_softirqs while
>>>> holding any locks. This is a very specific context so it's likely fine
>>>> to not drop the lock, but would still seem incorrect to me.
>>> Ok
>>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>> Indeed, there's no chance of that because process_pending_softirqs
>>>> will never try to do a scheduling operation that would result in our
>>>> context being scheduled out.
>>>       while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>>> -ERESTART )
>>>       {
>>>           /*
>>>            * FIXME: Given the context apply_map is called from (dom0 
>>> specific
>>>            * init code at system_state < SYS_STATE_active) it is not 
>>> strictly
>>>            * required that pdev->domain->vpci_rwlock is unlocked before 
>>> calling
>>>            * process_pending_softirqs as there is no contention possible 
>>> between
>>>            * this code and vpci_process_pending trying to acquire the lock 
>>> in
>>>            * read mode. But running process_pending_softirqs with any lock 
>>> held
>>>            * doesn't seem to be a good practice, so drop the lock and 
>>> re-acquire
>>>            * it right again.
>>>            */
>>>           write_unlock(&pdev->domain->vpci_rwlock);
>>>           process_pending_softirqs();
>>>           write_lock(&pdev->domain->vpci_rwlock);
>>>       }
>> I'm afraid that's misleading at best. apply_map() is merely a specific
>> example where you know the lock is going to be taken. But really any
>> softirq handler could be acquiring any lock, so requesting to process
>> softirqs cannot ever be done with any lock held.
>>
>> What you instead want to explain is why, after re-acquiring the lock,
>> no further checking is needed for potentially changed state.
> How about:
> 
> /*
>   * FIXME: Given the context apply_map is called from (dom0 specific
>   * init code at system_state < SYS_STATE_active) there is no contention
>   * possible between this code and vpci_process_pending trying to acquire
>   * the lock in read mode and destroy pdev->vpci in its error path.
>   * Neither pdev may be disposed yet, so it is not required to check if the
>   * relevant pdev still exists after re-acquiring the lock.
>   */

I'm not sure I follow the first sentence; I guess a comma or two may help,
and or using "as well as" in place of one of the two "and". I also don't
think you mean contention, but rather a race between the named entities?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.