[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Date: Mon, 24 Apr 2023 14:15:27 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Bopo97wd/BIYYnVeTcenWCOubx9SKDmqoQghR5ldbdk=; b=LYNOarAtzLjMX8mHjKz/K4sGn5Aa17x3rQzGKuzRi98rzspMhc6R3fsmUzXU3KRxovQaJs2djIyvxcRHCJfbR+sHIA6GhFdR94BkYtid2PreARlUg4QaVTA/fFVOJJR7SWoctX88XONdDjaEQxaxnrnPNEukqEnjt0Vw/0q8i2CkYyMc0h15kNCnbRmqezVtQl52M64bpA6Kk29JHzodWH4NJzKJu+VSy4Zty6lmFy5HTHLh1Mpd9Lp1l20/gD5q7jLRLeAI5+XHXxHRr68JazT3ZeKWnJKhAjdjW2GoOwakr8cS6kXAkw7FlnEWi8ffo1IyK2sHen6VcOVfkJQuqQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Kya0Lf8UgkTl7eggJ80mcaQwnFyTlpqFk9HK9cigxJGuf2FPII6ZDEkYAqQRJ12LWYkBHTehE3qSMnBEQ0nMkDSU8qBHcHQFIAlBDAvDSgFJ7GkEt9jbVSp1ucvJVwpRImaa9G1gtiCayS3RrbKY9vC5sMiPDwe3f3N054N6J5ZHWLqQajTNHSDZzJ1iFhIEfDq8wsMRUbqMGLPlHKus2+KCIEai/gUW/5sltpZPuYZObKvzyLrWgcm7pZ7Dx1XYn+1S/lOvlfW5UWO5I0aEa9IXsTc0qg2OszIC0hZraBs9IRcHzRuY1qftsa0KbdrafuZcVO/jEqBH0G0q7et2XA==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 24 Apr 2023 14:15:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZVrdzsdckomMx4kauxHkZQ597Iq79mAMAgClI2ACAAK/igIAAVGGAgAGe14CAAJvOAIAFXoKAgAAEr4CAAAS6AIAGQ32AgAAsn4CAAAL9gIAEWYgAgABm+AA=
  • Thread-topic: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev

Hi Jan,

Jan Beulich <jbeulich@xxxxxxxx> writes:

> On 21.04.2023 16:13, Volodymyr Babchuk wrote:
>> 
>> Hi Roger,
>> 
>> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
>> 
>>> On Fri, Apr 21, 2023 at 11:00:23AM +0000, Volodymyr Babchuk wrote:
>>>>
>>>> Hello Roger,
>>>>
>>>> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
>>>>
>>>>> On Mon, Apr 17, 2023 at 12:34:31PM +0200, Jan Beulich wrote:
>>>>>> On 17.04.2023 12:17, Roger Pau Monné wrote:
>>>>>>> On Fri, Apr 14, 2023 at 01:30:39AM +0000, Volodymyr Babchuk wrote:
>>>>>>>> Above I have proposed another view on this. I hope, it will work for
>>>>>>>> you. Just to reiterate, idea is to allow "harmless" refcounts to be 
>>>>>>>> left
>>>>>>>> after returning from pci_remove_device(). By "harmless" I mean that
>>>>>>>> owners of those refcounts will not try to access the physical PCI
>>>>>>>> device if pci_remove_device() is already finished.
>>>>>>>
>>>>>>> I'm not strictly a maintainer of this piece code, albeit I have an
>>>>>>> opinion.  I will like to also hear Jans opinion, since he is the
>>>>>>> maintainer.
>>>>>>
>>>>>> I'm afraid I can't really appreciate the term "harmless refcounts". 
>>>>>> Whoever
>>>>>> holds a ref is entitled to access the device. As stated before, I see 
>>>>>> only
>>>>>> two ways of getting things consistent: Either pci_remove_device() is
>>>>>> invoked upon dropping of the last ref,
>>>>>
>>>>> With this approach, what would be the implementation of
>>>>> PHYSDEVOP_manage_pci_remove?  Would it just check whether the pdev
>>>>> exist and either return 0 or -EBUSY?
>>>>>
>>>>
>>>> Okay, I am preparing patches with the behavior you proposed. To test it,
>>>> I artificially set refcount to 2 and indeed PHYSDEVOP_manage_pci_remove
>>>> returned -EBUSY, which propagated to the linux driver. Problem is that
>>>> Linux driver can't do anything with this. It just displayed an error
>>>> message and removed device anyways. This is because Linux sends
>>>> PHYSDEVOP_manage_pci_remove in device_remove() call path and there is no
>>>> way to prevent the device removal. So, admin is not capable to try this
>>>> again.
>>>
>>> Ideally Linux won't remove the device, and then the admin would get to
>>> retry.  Maybe the way the Linux hook is placed is not the best one?
>>> The hypervisor should be authoritative on whether a device can be
>>> removed or not, and hence PHYSDEVOP_manage_pci_remove returning an
>>> error (EBUSY or otherwise) shouldn't allow the device unplug in Linux
>>> to continue.
>> 
>> Yes, it would be ideally, but Linux driver/device model is written in a
>> such way, that PCI subsystem tracks all the PCI device usage, so it can
>> be certain that it can remove the device. Thus, functions in the device
>> removal path either return void or 0. Of course, kernel does not know that
>> hypervisor has additional uses for the device, so there is no mechanisms
>> to prevent removal.
>
> Could pciback obtain a reference on behalf of the hypervisor, dropping it
> when device removal is requested (i.e. much closer to the start of that
> operation), and only if it finds that no guests use the device anymore?

Yes, it can, it this indeed will hold a reference to a pci device for a
time, but there are some consideration that made this approach not
feasible.

Basically, when an user writes to /sys/bus/pci/SBDF/remove, the
following happens:

1. /sys/bus/pci/SBFD/remove entry is removed - we can't retry the
operation anymore

[unimportant things]

N. pci_stop_dev() function is called. This function unloads a device
driver. Any good behaving driver should drop all additional references
to a device at this point.

[more unimportant things]

M. PCI subsystem drops own reference to a generic device object

So, as you can see, admin can't restart a "failed" attempt to remove a
PCI device.

On other hand, remove() function can sleep. This allows us to pause
removal process a bit and check if hypervisor had finished removing a
PCI device on its side. But, as you pointed out, this can take weeks...

-- 
WBR, Volodymyr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.