[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci




On 04.02.22 15:06, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 12:53:20PM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 14:47, Jan Beulich wrote:
>>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>>>> On 04.02.22 13:37, Jan Beulich wrote:
>>>>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>>>>>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>>>>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>>>>>>> On 04.02.22 11:15, Jan Beulich wrote:
>>>>>>>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>>>>>>>>>> On 04.02.22 09:52, Jan Beulich wrote:
>>>>>>>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>>>>>>>>>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>>>>>>>>>>>> *pdev, uint16_t cmd, bool rom_only)
>>>>>>>>>>>>                       continue;
>>>>>>>>>>>>               }
>>>>>>>>>>>>       
>>>>>>>>>>>> +        spin_lock(&tmp->vpci_lock);
>>>>>>>>>>>> +        if ( !tmp->vpci )
>>>>>>>>>>>> +        {
>>>>>>>>>>>> +            spin_unlock(&tmp->vpci_lock);
>>>>>>>>>>>> +            continue;
>>>>>>>>>>>> +        }
>>>>>>>>>>>>               for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
>>>>>>>>>>>> i++ )
>>>>>>>>>>>>               {
>>>>>>>>>>>>                   const struct vpci_bar *bar = 
>>>>>>>>>>>> &tmp->vpci->header.bars[i];
>>>>>>>>>>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>>>>>>>>>>> *pdev, uint16_t cmd, bool rom_only)
>>>>>>>>>>>>                   rc = rangeset_remove_range(mem, start, end);
>>>>>>>>>>>>                   if ( rc )
>>>>>>>>>>>>                   {
>>>>>>>>>>>> +                spin_unlock(&tmp->vpci_lock);
>>>>>>>>>>>>                       printk(XENLOG_G_WARNING "Failed to remove 
>>>>>>>>>>>> [%lx, %lx]: %d\n",
>>>>>>>>>>>>                              start, end, rc);
>>>>>>>>>>>>                       rangeset_destroy(mem);
>>>>>>>>>>>>                       return rc;
>>>>>>>>>>>>                   }
>>>>>>>>>>>>               }
>>>>>>>>>>>> +        spin_unlock(&tmp->vpci_lock);
>>>>>>>>>>>>           }
>>>>>>>>>>> At the first glance this simply looks like another unjustified (in 
>>>>>>>>>>> the
>>>>>>>>>>> description) change, as you're not converting anything here but you
>>>>>>>>>>> actually add locking (and I realize this was there before, so I'm 
>>>>>>>>>>> sorry
>>>>>>>>>>> for not pointing this out earlier).
>>>>>>>>>> Well, I thought that the description already has "...the lock can be
>>>>>>>>>> used (and in a few cases is used right away) to check whether vpci
>>>>>>>>>> is present" and this is enough for such uses as here.
>>>>>>>>>>>       But then I wonder whether you
>>>>>>>>>>> actually tested this, since I can't help getting the impression that
>>>>>>>>>>> you're introducing a live-lock: The function is called from 
>>>>>>>>>>> cmd_write()
>>>>>>>>>>> and rom_write(), which in turn are called out of vpci_write(). Yet 
>>>>>>>>>>> that
>>>>>>>>>>> function already holds the lock, and the lock is not (currently)
>>>>>>>>>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>>>>>>>>>> the locking looks to be entirely unnecessary.)
>>>>>>>>>> Well, you are correct: if tmp != pdev then it is correct to acquire
>>>>>>>>>> the lock. But if tmp == pdev and rom_only == true
>>>>>>>>>> then we'll deadlock.
>>>>>>>>>>
>>>>>>>>>> It seems we need to have the locking conditional, e.g. only lock
>>>>>>>>>> if tmp != pdev
>>>>>>>>> Which will address the live-lock, but introduce ABBA deadlock 
>>>>>>>>> potential
>>>>>>>>> between the two locks.
>>>>>>>> I am not sure I can suggest a better solution here
>>>>>>>> @Roger, @Jan, could you please help here?
>>>>>>> Well, first of all I'd like to mention that while it may have been okay 
>>>>>>> to
>>>>>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>>>>>>> dealing
>>>>>>> with DomU-s' lists of PCI devices. The requirement really applies to the
>>>>>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>>>>>>> there it probably wants to be a try-lock.
>>>>>>>
>>>>>>> Next I'd like to point out that here we have the still pending issue of
>>>>>>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>>>>>>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
>>>>>>> solution
>>>>>>> here, I think it wants to at least account for the extra need there.
>>>>>> Yes, sorry, I should take care of that.
>>>>>>
>>>>>>> Now it is quite clear that pcidevs_lock isn't going to help with 
>>>>>>> avoiding
>>>>>>> the deadlock, as it's imo not an option at all to acquire that lock
>>>>>>> everywhere else you access ->vpci (or else the vpci lock itself would be
>>>>>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>>>>>>> would acquire it in read mode, and here you'd acquire it in write mode 
>>>>>>> (in
>>>>>>> the former case around the vpci lock, while in the latter case there may
>>>>>>> then not be any need to acquire the individual vpci locks at all). 
>>>>>>> FTAOD:
>>>>>>> I haven't fully thought through all implications (and hence whether 
>>>>>>> this is
>>>>>>> viable in the first place); I expect you will, documenting what you've
>>>>>>> found in the resulting patch description. Of course the double lock
>>>>>>> acquire/release would then likely want hiding in helper functions.
>>>>>> I've been also thinking about this, and whether it's really worth to
>>>>>> have a per-device lock rather than a per-domain one that protects all
>>>>>> vpci regions of the devices assigned to the domain.
>>>>>>
>>>>>> The OS is likely to serialize accesses to the PCI config space anyway,
>>>>>> and the only place I could see a benefit of having per-device locks is
>>>>>> in the handling of MSI-X tables, as the handling of the mask bit is
>>>>>> likely very performance sensitive, so adding a per-domain lock there
>>>>>> could be a bottleneck.
>>>>> Hmm, with method 1 accesses serializing globally is basically
>>>>> unavoidable, but with MMCFG I see no reason why OSes may not (move
>>>>> to) permit(ting) parallel accesses, with serialization perhaps done
>>>>> only at device level. See our own pci_config_lock, which applies to
>>>>> only method 1 accesses; we don't look to be serializing MMCFG
>>>>> accesses at all.
>>>>>
>>>>>> We could alternatively do a per-domain rwlock for vpci and special case
>>>>>> the MSI-X area to also have a per-device specific lock. At which point
>>>>>> it becomes fairly similar to what you propose.
>>>> @Jan, @Roger
>>>>
>>>> 1. d->vpci_lock - rwlock <- this protects vpci
>>>> 2. pdev->vpci->msix_tbl_lock - rwlock <- this protects MSI-X tables
>>>> or should it better be pdev->msix_tbl_lock as MSI-X tables don't
>>>> really depend on vPCI?
>>> If so, perhaps indeed better the latter. But as said in reply to Roger,
>>> I'm not convinced (yet) that doing away with the per-device lock is a
>>> good move. As said there - we're ourselves doing fully parallel MMCFG
>>> accesses, so OSes ought to be fine to do so, too.
>> But with pdev->vpci_lock we face ABBA...
> I think it would be easier to start with a per-domain rwlock that
> guarantees pdev->vpci cannot be removed under our feet. This would be
> taken in read mode in vpci_{read,write} and in write mode when
> removing a device from a domain.
>
> Then there are also other issues regarding vPCI locking that need to
> be fixed, but that lock would likely be a start.
Or let's see the problem at a different angle: this is the only place
which breaks the use of pdev->vpci_lock. Because all other places
do not try to acquire the lock of any two devices at a time.
So, what if we re-work the offending piece of code instead?
That way we do not break parallel access and have the lock per-device
which might also be a plus.

By re-work I mean, that instead of reading already mapped regions
from tmp we can employ a d->pci_mapped_regions range set which
will hold all the already mapped ranges. And when it is needed to access
that range set we use pcidevs_lock which seems to be rare.
So, modify_bars will rely on pdev->vpci_lock + pcidevs_lock and
ABBA won't be possible at all.

>
> Thanks, Roger.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.