[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 11 Feb 2022 12:40:00 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GhEqp8JLPp0ZrWvMW0UOYxMzPylgA9nxtDTn5hBjpZY=; b=ZuZYdJnuygGzYaw4t4BsrX+lYDFO0XeJLjFEAW/FAwVCNnWqHnDIO+2A2CyEa+6flIhkE/SM4fVmP+5LhamarGzVV/diln1sI4t+QBa1if7ifKYoGn4iCgerSn5xkGJURIIKSMsIgzQa1t6hJtBPn4i4rTAq7aksDwsIJgZ+8aPuFXF3tf96NgIt8v5HzkDvFW+F3ib0IU/qed3FX6P+uw4AjOSW7fOjn/APxOnVn8F18momdqmrvw+5TN+HG4MHPypJnZTywURiqETzTNrLLMbxOQhpzd/jP5VOf0y1kp9QL2SEUtZrkRU/IVTese0/ILDeLTPYvWaBV6PsI5Id+Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fzio09cVLTvCH1jQz7xFJiY0q5MvRnievUhL/HJYqUoQkvk5slpmi5raW1R5hDLIa1SOwB666lREPAmpniG2zxw0c0YeZljjmuiGIqy9C9lSu3YovODWS84fDAOyqMoxd/kxIkXbj7LrAVFOIrnwQrHuhCuv7UBtvJIZVPnXowrSHgMJFh8ZytA46ISY9TL5UTkJQY0zPbA/IaL2gUkvi8lDmQDVBF5n8mzngIOKPpoVQEXFFcsdVNMh4m7+vz1Ad2keVbn5I7vqmeH2qAyLZtu41Jgj6hmcMmi4si7saP8Hl3Jz85ypblvY0nqc8DPc1BTgGVfZftgFXkcY1K1Sbw==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>
  • Delivery-date: Fri, 11 Feb 2022 11:40:15 +0000
  • Ironport-data: A9a23:0m978K/40JXcILu0Y2VpDrUDkniTJUtcMsCJ2f8bNWPcYEJGY0x3x zMXUD2CM/qMM2bzKdAgbdy/oBhS7cLSztYwSQdkris8E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si9AttENlFEkvU2ybuOU5NXsZ2YhFWeIdA970Ug5w7Rg29Yx6TSEK1jlV e3a8pW31GCNg1aYAkpMg05UgEoy1BhakGpwUm0WPZinjneH/5UmJMt3yZWKB2n5WuFp8tuSH I4v+l0bElTxpH/BAvv9+lryn9ZjrrT6ZWBigVIOM0Sub4QrSoXfHc/XOdJFAXq7hQllkPhU+ fxGjbi3cTskI/31iPswXgZZN3phaPguFL/veRBTsOSWxkzCNXDt3+9vHAc9OohwFuRfWD8Us 6ZCcXZUM07F17neLLGTE4GAguw5K8bmJsUHs2xIxjDFF/c2B5vERs0m4PcGh2tt2pAeRZ4yY eITdyt9Qw/QeCZkN3QYBZg1vtW6nWHWJmgwRFW9+vNsvjm7IBZK+IbqNN3Za9mbX/J/l0yTp n/F12nhCxRcP9uaoRKi9n+vnebJkTnMZJMJFLa4+/hph3We3mUWThYRUDOTiOOlh0uJfsNQI k0Z5AIjtaE3skesS7HVRRS4vXrCpR8aVNp4Gvc/rgqKz8L86QuDGnINSDIHbdU8rdI3XhQjz FrPlNTsbRR/vbvQRX+D+7O8qTKpJTNTPWIEfTUDTwYO/5/kuo5bpjXLQ9V4Gai5lOrcHz3q3 iuKpygzgbYUpcMT3qD99lfC6xqurJXUSg8+5i3MQ3moqAh+YeaNfJe04FLW6fJBKoexTVSbu nUA3c+E44gmD4yJlSGLaPUAGveu/fntDdHHqQcxRd97rW3roiP9O9ALiN1jGKt3GuM1UDTRR EjrhQ5MtKZDI1ancq5TOJ3kXqzG0pPcPdjiU/nVaP9HbZ5waBKL8UlSWKKA44z+uBNyyP9iY P93Ze7pVC9HUvo/kFJaUs9AiedD+8wo+Y/EqXkXJTyD2KHWWnOaQKxt3LCmPrFgt/PsTOk4H r9i2yq2J/d3DbWWjsr/q9d7wbU2wZ4TX8GeRyt/LLDrH+aeMDt9Y8I9OJt4E2Cfo4xbl/3T4 la2UVJCxVz0iBXvcFvWNi05NeywB8sm9RrX2BDA237yihDPhq71ss8im2YfJ+F7pISPM9YoJ xX6RylwKqsWEWmWk9jsRZL8sJZjZHyWafGmZEKYjMwEV8c4HWTho4a8FiO2rXVmJnfn5KMW/ uz7viuGEMVreuiXJJuPAB5Z5wjq5iZ1dSMbdxagH+S/j22yrNY0e3as0pfa4agkcH3++9dT7 C7PaT8wrujRuY4ltt7PgKGPtYCyFOViWEFdGgHmAXyebEE2J0Kvnt1NVvimZzfYWD+m8amuf 7wNnfr9LOcGjBBBtI8lS+Rnyqc35t3Oob5Gz1s7QCWXPgrzUr4wcGOb2cRvt7FWwuMLswWBR U/SqMJRPq+EOZ25HQdJdhYldOmKydodhiLWsaYuOEz/6SIupOiHXExeMgOikitYKLcpYoopz f144Jwd6hCliwpsOdGD13gG+2OJJ30GcqMmqpBFX9O71lt1kglPOMWOBDX37ZeDb8R3HnMre jLE1rDfg7l8x1bZdyZhH3b6wucA148FvwpHzQFeKg3RyMbFnPI+wDZY7S8zElZO1hxC3u9+Z jprOklyKfnc9jtknpEeDWWlGgUHDxyF4E3hjVAOkTSBHUWvU2XMKkw7OPqMox9Foz4NIGAD8 eHK0nvhXBbrYNr1j3k7VkNSovD+ScB8q1/Zk8e9EsXZR5Q3bFIJWEN1ibbkf/c/Pf4MuQ==
  • Ironport-hdrordr: A9a23:MGJsSqyJmHb8zg8WeoxgKrPxiOskLtp133Aq2lEZdPULSKOlfp GV8MjziyWYtN9IYgBcpTiBUJPwJE81bfZOkMcs1MSZLXXbUQyTXcBfBOrZsnLd8kjFmNK1up 0QCpSWZOeAbmSSyPyKmjVQcOxQgOVvkprY/ds2pk0FJWoBCsFdBkVCe32m+yVNNVN77PECZf 6hD7981lydkAMsH6OG7xc+Lor+juyOsKijTQ8NBhYh5gXLpyiv8qTGHx+R2Qpbey9TwJ85mF K10DDR1+GGibWW2xXc32jc49B9g9360OZOA8SKl4w8NijssAC1f45sMofy/gzd4dvfrWrCou O85CvIDP4DrU85uVvF+CcF7jOQlArGLUWSkWNwz0GT+vARDwhKdPapzbgpDCcxrXBQ4e2UmZ g7r15w/fBsfGL9tTW46N7SWx5wkE2o5XIkjO4IlnRaFZATcblLsOUkjQlo+bo7bWrHAbocYa JT5QDnlYJrWELfa2qcsnhkwdSqUHh2FhCaQlIassjQ1zRNhnh2w0YR2cRaxx47hd4AYogB4/ 6BPrVjlblIQMNTZaVhBP0ZSc/yDmDWWxrDPG+bPFyiHqAaPHDGrYLx/dwOlayXUY1NyIF3lI XKUVteu2J3c0XyCdeW1JkO6RzJSHXVZ0Wl9iif3ekOhlTRfsuYDcSzciFYryL7mYRtPiTyYY fHBK5r
  • Ironport-sdr: t9UxLRNYCtq6mURdsbkDvB4mWMrz6zVC7yt0T02Q480gnsUmJN0w4mX3/GONbDNMt1fFfHNBqB mU5hsrT54m0Qnu8mQczfblxx13A0w8DnaxI+/xN2Bwljl+Utzq2qzThB9TuDyJdazdN6fGVpfB WIf711MyMf08ZsHMr4nk56w50f6JrKL/lq38WbBkASior7yTHV3SK8/mn9ys+qf0mnvFkrHLS2 3Fu36rl/RXMPKNNm+F/W4xLwI0DnfRb9yiWIRp+R/zyP9J55ByO5jqbyvODV6Z8V1zvRaBpdFu h+tLpWCnDO62/uXXkw8MNO70
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Feb 11, 2022 at 07:27:39AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 10.02.22 18:16, Roger Pau Monné wrote:
> > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >>
> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> so we are sure there are no accesses to the contents of the vpci struct
> >> if not. This lock can be used (and in a few cases is used right away)
> >> so that vpci removal can be performed while holding the lock in write
> >> mode. Previously such removal could race with vpci_read for example.
> > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> > pci_remove_device, and likely when vPCI gets also used in
> > {de}assign_device I think.
> Yes, this is indeed an issue, but I was not trying to solve it in
> context of vPCI locking yet. I think we should discuss how do
> we approach pdev locking, so I can create a patch for that.
> that being said, I would like not to solve pdev in  this patch yet
> 
> ...I do understand we do want to avoid that, but at the moment
> a single reliable way for making sure pdev is alive seems to
> be pcidevs_lock....

I think we will need to make pcidevs_lock a rwlock and take it in read
mode for pci_get_pdev_by_domain.

We didn't have this scenario before where PCI emulation is done in the
hypervisor, and hence the locking around those data structures has not
been designed for those use-cases.

> >
> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> >> from being removed.
> >>
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if done
> >> under the read lock, requires vpci->lock to be acquired on both devices
> >> being compared, which may produce a deadlock. It is not possible to
> >> upgrade read lock to write lock in such a case. So, in order to prevent
> >> the deadlock, check which registers are going to be written and acquire
> >> the lock in the appropriate mode from the beginning.
> >>
> >> All other code, which doesn't lead to pdev->vpci destruction and does not
> >> access multiple pdevs at the same time, can still use a combination of the
> >> read lock and pdev->vpci->lock.
> >>
> >> 3. Optimize if ROM BAR write lock required detection by caching offset
> >> of the ROM BAR register in vpci->header->rom_reg which depends on
> >> header's type.
> >>
> >> 4. Reduce locked region in vpci_remove_device as it is now possible
> >> to set pdev->vpci to NULL early right after the write lock is acquired.
> >>
> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
> >> initialize many more fields of the struct vpci before assigning it to
> >> pdev->vpci.
> >>
> >> 6. vpci_{add|remove}_register are required to be called with the write lock
> >> held, but it is not feasible to add an assert there as it requires
> >> struct domain to be passed for that. So, add a comment about this 
> >> requirement
> >> to these and other functions with the equivalent constraints.
> >>
> >> 7. Drop const qualifier where the new rwlock is used and this is 
> >> appropriate.
> >>
> >> 8. This is based on the discussion at [1].
> >>
> >> [1] 
> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$
> >>  [lore[.]kernel[.]org]
> >>
> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >>
> >> ---
> >> This was checked on x86: with and without PVH Dom0.
> >> ---
> >>   xen/arch/x86/hvm/vmsi.c   |   2 +
> >>   xen/common/domain.c       |   3 +
> >>   xen/drivers/vpci/header.c |   8 +++
> >>   xen/drivers/vpci/msi.c    |   8 ++-
> >>   xen/drivers/vpci/msix.c   |  40 +++++++++++--
> >>   xen/drivers/vpci/vpci.c   | 114 ++++++++++++++++++++++++++++----------
> >>   xen/include/xen/sched.h   |   3 +
> >>   xen/include/xen/vpci.h    |   2 +
> >>   8 files changed, 146 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> >> index 13e2a190b439..351cb968a423 100644
> >> --- a/xen/arch/x86/hvm/vmsi.c
> >> +++ b/xen/arch/x86/hvm/vmsi.c
> >> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
> >>   {
> >>       unsigned int i;
> >>   
> >> +    ASSERT(!!rw_is_locked(&msix->pdev->domain->vpci_rwlock));
> >                ^ no need for the double negation.
> Ok, will update all asserts which use !!
> >
> > Also this asserts that the lock is taken, but could be by a different
> > pCPU.  I guess it's better than nothing.
> Fair enough. Do you still want the asserts or should I remove them?

Likely fine to leave them.

> >
> >> +
> >>       for ( i = 0; i < msix->max_entries; i++ )
> >>       {
> >>           const struct vpci_msix_entry *entry = &msix->entries[i];
> > Since this function is now called with the per-domain rwlock read
> > locked it's likely not appropriate to call process_pending_softirqs
> > while holding such lock (check below).
> You are right, as it is possible that:
> 
> process_pending_softirqs -> vpci_process_pending -> read_lock
> 
> Even more, vpci_process_pending may also
> 
> read_unlock -> vpci_remove_device -> write_lock
> 
> in its error path. So, any invocation of process_pending_softirqs
> must not hold d->vpci_rwlock at least.
> 
> And also we need to check that pdev->vpci was not removed
> in between or *re-created*
> >
> > We will likely need to re-iterate over the list of pdevs assigned to
> > the domain and assert that the pdev is still assigned to the same
> > domain.
> So, do you mean a pattern like the below should be used at all
> places where we need to call process_pending_softirqs?
> 
> read_unlock
> process_pending_softirqs
> read_lock
> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
> <continue processing>

Something along those lines. You likely need to continue iterate using
for_each_pdev.

> >> +{
> >> +    /*
> >> +     * Writing the command register and ROM BAR register may trigger
> >> +     * modify_bars to run which in turn may access multiple pdevs while
> >> +     * checking for the existing BAR's overlap. The overlapping check, if 
> >> done
> >> +     * under the read lock, requires vpci->lock to be acquired on both 
> >> devices
> >> +     * being compared, which may produce a deadlock. It is not possible to
> >> +     * upgrade read lock to write lock in such a case. So, in order to 
> >> prevent
> >> +     * the deadlock, check which registers are going to be written and 
> >> acquire
> >> +     * the lock in the appropriate mode from the beginning.
> >> +     */
> >> +    if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
> >> +        return true;
> >> +
> >> +    if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
> > No need for the comparison if rom_reg is unset. Also you can OR both
> > conditions into a single if.
> If we open code vpci_offset_cmp with a single if all this is going
> to be a bit clumsy:
> 
>      if ( r1_offset < r2_offset + r2_size &&
>           r2_offset < r1_offset + r1_size )
>          return 0;
> This is a single check.
> Now we need to check two registers with the code above and
> also check that pdev->vpci->header.rom_reg != 0
> 
> I think it would be more readable if we have a tiny helper function
> 
> static bool vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size,
>                             unsigned int r2_offset, unsigned int r2_size)
> {
>      /* Return 0 if registers overlap. */
>      if ( r1_offset < r2_offset + r2_size &&
>           r2_offset < r1_offset + r1_size )
>          return false;
>      return true;
> }
> 
> So, then we can have something like
> 
> static bool vpci_header_write_lock(const struct pci_dev *pdev,
>                                     unsigned int start, unsigned int size)
> {
>      if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) ||
>           (pdev->vpci->header.rom_reg &&
>            !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4)) )
>          return true;
> 
>      return false;
> }

Just create an 'overlaps' static function in header.c.

Thanks, Roger.
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.