[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Tue, 8 Feb 2022 07:35:34 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Pn9nzzHmCPMZQ330jOvkVVzAhCffJTjMHNM9mARK4fM=; b=XFyUYvZVTxzc3Q0Hog8iRYpG+QaA4p9ba3uYsPvqxNCWU1AGka9MCDCqB6ajv5m23u9BCsD/6NS4+TUHtHmGL9pgZ0vbJUX2sodW+lHqocmYV9upKxb71WvSfDu2FPpHBU4TyKeBimLdIXbDbSi2tWtDb9h5gTV17I+fSN3MMTuS5Z2RTxR6SNIqpWDYNkAZxTrULlLY4fOB96P4LzhXlVnZB0gV9WA3lqsg3tuC02T6gyHIEqagACmFwvxWWsnpoIjN2oVBIKMX5UoNlP8v7rCeEKPlc6OnT+Z58TJ6mgg84+elBCVf/DW5U3tR70o0umFFYi0fWhaYOBWLRYrBvA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Hvnnp+L4K/qFR7pncIsp8AUH9T+wNVdBAbwwdgB7KLGQN8zGvtx1yuV4JvYjYj7mhs/zaEXDxn0wzbKRR/LYHz34SWjRZ2TjuNnsKNO7WOqPmPPcL+HvSuWhbG3I//93JA+cmPEF9AiJidvUgNrKkYQK8kT9rFVlOGWA+LMy93x5KVadTMzzrPAJq8CK/7Wsiov5WNNmFM3SZkmPeDJdPx5XxsQVrB4/nm2cSEotDY0KdgYoCuNCner0d+31sbZtJQsq8mZgxMwxP9i1pyx9+2SBjWMWfy2Nw1BtpIiFkuckUicyOHXO+RKt9V0cFnOjmoXXy3yMuomJh5SWOcAJ2Q==
  • Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 08 Feb 2022 07:36:02 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYGZFc/MnzQOjwVEeBBUHLSW0md6yDBUkAgAASSACAAATYAIAAD/WAgAAKNgCAAAbfgIAABnuAgAAQvgCAAAMCAIAAAY4AgAADxICAABrnAIAABAgAgAR3CoCAABt5gIAAEpuAgAAE5ICAAASKAIAAAiiAgAAKNYCAAARNAIAAC1wAgAACRYCAAAGVgIAABJiAgAAB5wCAAPjsgA==
  • Thread-topic: [PATCH v6 03/13] vpci: move lock outside of struct vpci


On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
>
> On 07.02.22 18:37, Jan Beulich wrote:
>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 18:15, Jan Beulich wrote:
>>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>>>>> On 07.02.22 17:26, Jan Beulich wrote:
>>>>>> 1b. Make vpci_write use write lock for writes to command register and 
>>>>>> BARs
>>>>>> only; keep using the read lock for all other writes.
>>>>> I am not quite sure how to do that. Do you mean something like:
>>>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>>>                     uint32_t data)
>>>>> [snip]
>>>>>         list_for_each_entry ( r, &pdev->vpci->handlers, node )
>>>>> {
>>>>> [snip]
>>>>>         if ( r->needs_write_lock)
>>>>>             write_lock(d->vpci_lock)
>>>>>         else
>>>>>             read_lock(d->vpci_lock)
>>>>> ....
>>>>>
>>>>> And provide rw as an argument to:
>>>>>
>>>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>>>>                           vpci_write_t *write_handler, unsigned int 
>>>>> offset,
>>>>>                           unsigned int size, void *data, --->>> bool 
>>>>> write_path <<<-----)
>>>>>
>>>>> Is this what you mean?
>>>> This sounds overly complicated. You can derive locally in vpci_write(),
>>>> from just its "reg" and "size" parameters, whether the lock needs taking
>>>> in write mode.
>>> Yes, I started writing a reply with that. So, the summary (ROM
>>> position depends on header type):
>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>> {
>>>        read PCI_COMMAND and see if memory or IO decoding are enabled.
>>>        if ( enabled )
>>>            write_lock(d->vpci_lock)
>>>        else
>>>            read_lock(d->vpci_lock)
>>> }
>> Hmm, yes, you can actually get away without using "size", since both
>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>> accesses get split in vpci_ecam_write().
> But, OS may want reading a single byte of ROM BAR, so I think
> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> ranges
>> For the command register the memory- / IO-decoding-enabled check may
>> end up a little more complicated, as the value to be written also
>> matters. Maybe read the command register only for the ROM BAR write,
>> using the write lock uniformly for all command register writes?
> Sounds good for the start.
> Another concern is that if we go with a read_lock and then in the
> underlying code we disable memory decoding and try doing
> something and calling cmd_write handler for any reason then....
>
> I mean that the check in the vpci_write is somewhat we can tolerate,
> but then it is must be considered that no code in the read path
> is allowed to perform write path functions. Which brings a pretty
> valid use-case: say in read mode we detect an unrecoverable error
> and need to remove the device:
> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>
> What do we do then? It is all going to be fragile...
I have tried to summarize the options we have wrt locking
and would love to hear from @Roger and @Jan.

In every variant there is a task of dealing with the overlap
detection in modify_bars, so this is the only place as of now
which needs special treatment.

Existing limitations: there is no way to upgrade a read lock to a write
lock, so paths which may require write lock protection need to use
write lock from the very beginning. Workarounds can be applied.

1. Per-domain rw lock, aka d->vpci_lock
==============================================================
Note: with per-domain rw lock it is possible to do without introducing
per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
should be required.

This is only going to work in case if vpci_write always takes the write lock
and vpci_read takes a read lock and no path in vpci_read is allowed to
perform write path operations.
vpci_process_pending uses write lock as it have vpci_remove_device in its
error path.

Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars

Cons:
- all writes are serialized
- need to carefully select read paths, so they are guaranteed not to lead
   to lock upgrade use-cases

1.1. Semi read lock upgrade in modify bars
--------------------------------------------------------------
In this case both vpci_read and vpci_write take a read lock and when it comes
to modify_bars:

1. read_unlock(d->vpci_lock)
2. write_lock(d->vpci_lock)
3. Check that pdev->vpci is still available and is the same object:
if (pdev->vpci && (pdev->vpci == old_vpci) )
{
     /* vpci structure is valid and can be used. */
}
else
{
     /* vpci has gone, return an error. */
}

Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars
- readers and writers are NOT serialized
- NO need to carefully select read paths, so they are guaranteed not to lead
   to lock upgrade use-cases

Cons:
- ???

2. per-device lock (pdev->vpci_lock) + d->overlap_chk_lock
==============================================================
In order to solve overlap ABBA, we introduce a per-domain helper
lock to protect the overlapping code in modify_bars:

     old_vpci = pdev->vpci;
     spin_unlock(pdev->vpci_lock);
     spin_lock(pdev->domain->overlap_chk_lock);
     spin_lock(pdev->vpci_lock);
     if ( pdev->vpci && (pdev->vpci == old_vpci) )
         for_each_pdev ( pdev->domain, tmp )
         {
             if ( tmp != pdev )
             {
                 spin_lock(tmp->vpci_lock);
                 if ( tmp->vpci )
                     ...
             }
         }

Pros:
- all accesses are independent, only the same device access is serialized
- no need to care about readers and writers wrt read lock upgrade issues

Cons:
- helper spin lock

3. Move overlap detection into process pending
==============================================================
There is a Roger's patch [1] which adds a possibility for vpci_process_pending
to perform different tasks rather than just map/unmap. With this patch extended
in a way that it can hold a request queue it is possible to delay execution
of the overlap code until no pdev->vpci_lock is held, but before returning to
a guest after vpci_{read|write} or similar.

Pros:
- no need to emulate read lock upgrade
- fully parallel read/write
- queue in the vpci_process_pending will later on be used by SR-IOV,
   so this is going to help the future code
Cons:
- ???

4. Re-write overlap detection code
==============================================================
It is possible to re-write overlap detection code, so the information about the
mapped/unmapped regions is not read from vpci->header->bars[i] of each device,
but instead there is a per-domain structure which holds the regions and
implements reference counting.

Pros:
- solves ABBA

Cons:
- very complex code is expected

5. You name it
==============================================================

 From all the above I would recommend we go with option 2 which seems to 
reliably
solve ABBA and does not bring cons of the other approaches.

Thank you in advance,
Oleksandr

[1] 
https://lore.kernel.org/all/5BABA6EF02000078001EC452@xxxxxxxxxxxxxxxxxxxxxxxx/T/#m231fb0586007725bfd8538bb97ff1777a36842cf

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.