[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 13:57:13 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5yPHLHgmeOQcF1O/ZYWrADpmncFcloF/qTgxcvIwOCM=; b=MXjCfKYTNWyxXqB2jtOtpEZ2VEZ9HCiVYdWb76sJzT50s9hQsqV4SgCqoR2xGmLxN+Kne+sj3yVw+hOLbbDPR2tkSUkM/FrE9ZS+tzBI2jjXXedeDDwn30wwfZ2SUzOfh9WDjkPPjoq/5zcLfKJ0wchkM5btXTSKVq3MLUUxeoapfyqO98QUPXYl/UvPXaZ//C20eyvRvQYte+Dfwt5W6wQh4F3gisAaJphVxd3lzX7YRT9piqqphSlGKTSTOML4OV3OdS3WPra07+7EtfrzTr6bClN50/PP5tT09XM0N3VAAUaufaKrm3ia7dW0zQpfB++JfzQX15x79z6wBoFtbg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TCzn+Mvln+ELbUwR400W/fGkwfyJjziEWdTikok9J6SBH+ctjJLurXhnDFxLrAv6ZWAgaUY5wJRWkYRU3+x+cKgpxATrP26hpskluWozaIL87k9mzaG710fgWAr5FIy+f2aLMYzmiBNtdCISvuQCxV2LjZ+HQtwiFifWiOdXYG6mjGsi3JOGJvg/0rFq3QC9rtJ+Kg52Hrdjrrm32d/f49HJVzXAduUDLnoWJYOBMgAwghZLSnGA9EzhWRqGWYHoDTcBTJXp6U2bTeE++sapUm1C11mka8RRaIxSE8YGJ8t5yp6+dVLJJ058ch2bZal+AiIJMZCP+6uWF+PEvzEH/A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 14 Feb 2022 12:57:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
> 
> 
> On 14.02.22 13:25, Roger Pau Monné wrote:
>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>
>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote:
>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>> +
>>>>>>>>>>>           for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>           {
>>>>>>>>>>>               const struct vpci_msix_entry *entry = 
>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>> locked it's likely not appropriate to call process_pending_softirqs
>>>>>>>>>> while holding such lock (check below).
>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>
>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>
>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>
>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>
>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>
>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>> in between or *re-created*
>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned to
>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>> domain.
>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>
>>>>>>>>> read_unlock
>>>>>>>>> process_pending_softirqs
>>>>>>>>> read_lock
>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>> <continue processing>
>>>>>>>> Something along those lines. You likely need to continue iterate using
>>>>>>>> for_each_pdev.
>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>> this question before [1] and I was about to use some ID for that 
>>>>>>> purpose:
>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for checks
>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>> and that the resume MSI entry is not past the current limit. Otherwise
>>>>>> just print a message and move on to the next device.
>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>
>>>>> How about this one:
>>>>>
>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>> --- a/xen/drivers/vpci/header.c
>>>>> +++ b/xen/drivers/vpci/header.c
>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, const 
>>>>> struct pci_dev *pdev,
>>>>>                                 struct rangeset *mem, uint16_t cmd)
>>>>>     {
>>>>>         struct map_data data = { .d = d, .map = true };
>>>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>>>         int rc;
>>>>>
>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>> +
>>>>>         while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>>>>> -ERESTART )
>>>>> +    {
>>>>> +
>>>>> +        /*
>>>>> +         * process_pending_softirqs may trigger vpci_process_pending 
>>>>> which
>>>>> +         * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>>>> +         */
>>>>> +        write_unlock(&pdev->domain->vpci_rwlock);
>>>>>             process_pending_softirqs();
>>>>> +        write_lock(&pdev->domain->vpci_rwlock);
>>>>> +
>>>>> +        /* Check if pdev still exists and vPCI was not removed or 
>>>>> re-created. */
>>>>> +        if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) != 
>>>>> pdev)
>>>>> +            if ( vpci is NOT the same )
>>>>> +            {
>>>>> +                rc = 0;
>>>>> +                break;
>>>>> +            }
>>>>> +    }
>>>>> +
>>>>>         rangeset_destroy(mem);
>>>>>         if ( !rc )
>>>>>             modify_decoding(pdev, cmd, false);
>>>>>
>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>> it has something to do yet). So, I think we just need:
>>>>>
>>>>>            write_unlock(&pdev->domain->vpci_rwlock);
>>>>>            process_pending_softirqs();
>>>>>            write_lock(&pdev->domain->vpci_rwlock);
>>>>>
>>>>> and this should be enough
>>>> Given the context apply_map is called from (dom0 specific init code),
>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>> has been recreated, as it's not possible. Just add a comment to
>>>> explicitly note that the context of the function is special, and thus
>>>> there's no possibility of either the device or vpci going away.
>>> Does it really need write_unlock/write_lock given the context?...
>> I think it's bad practice to call process_pending_softirqs while
>> holding any locks. This is a very specific context so it's likely fine
>> to not drop the lock, but would still seem incorrect to me.
> Ok
>>
>>> I think it doesn't as there is no chance defer_map is called, thus
>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>> Indeed, there's no chance of that because process_pending_softirqs
>> will never try to do a scheduling operation that would result in our
>> context being scheduled out.
> 
>      while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
> -ERESTART )
>      {
>          /*
>           * FIXME: Given the context apply_map is called from (dom0 specific
>           * init code at system_state < SYS_STATE_active) it is not strictly
>           * required that pdev->domain->vpci_rwlock is unlocked before calling
>           * process_pending_softirqs as there is no contention possible 
> between
>           * this code and vpci_process_pending trying to acquire the lock in
>           * read mode. But running process_pending_softirqs with any lock held
>           * doesn't seem to be a good practice, so drop the lock and 
> re-acquire
>           * it right again.
>           */
>          write_unlock(&pdev->domain->vpci_rwlock);
>          process_pending_softirqs();
>          write_lock(&pdev->domain->vpci_rwlock);
>      }

I'm afraid that's misleading at best. apply_map() is merely a specific
example where you know the lock is going to be taken. But really any
softirq handler could be acquiring any lock, so requesting to process
softirqs cannot ever be done with any lock held.

What you instead want to explain is why, after re-acquiring the lock,
no further checking is needed for potentially changed state.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.