[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev
Hi Jan, Jan Beulich <jbeulich@xxxxxxxx> writes: > On 21.04.2023 16:13, Volodymyr Babchuk wrote: >> >> Hi Roger, >> >> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: >> >>> On Fri, Apr 21, 2023 at 11:00:23AM +0000, Volodymyr Babchuk wrote: >>>> >>>> Hello Roger, >>>> >>>> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: >>>> >>>>> On Mon, Apr 17, 2023 at 12:34:31PM +0200, Jan Beulich wrote: >>>>>> On 17.04.2023 12:17, Roger Pau Monné wrote: >>>>>>> On Fri, Apr 14, 2023 at 01:30:39AM +0000, Volodymyr Babchuk wrote: >>>>>>>> Above I have proposed another view on this. I hope, it will work for >>>>>>>> you. Just to reiterate, idea is to allow "harmless" refcounts to be >>>>>>>> left >>>>>>>> after returning from pci_remove_device(). By "harmless" I mean that >>>>>>>> owners of those refcounts will not try to access the physical PCI >>>>>>>> device if pci_remove_device() is already finished. >>>>>>> >>>>>>> I'm not strictly a maintainer of this piece code, albeit I have an >>>>>>> opinion. I will like to also hear Jans opinion, since he is the >>>>>>> maintainer. >>>>>> >>>>>> I'm afraid I can't really appreciate the term "harmless refcounts". >>>>>> Whoever >>>>>> holds a ref is entitled to access the device. As stated before, I see >>>>>> only >>>>>> two ways of getting things consistent: Either pci_remove_device() is >>>>>> invoked upon dropping of the last ref, >>>>> >>>>> With this approach, what would be the implementation of >>>>> PHYSDEVOP_manage_pci_remove? Would it just check whether the pdev >>>>> exist and either return 0 or -EBUSY? >>>>> >>>> >>>> Okay, I am preparing patches with the behavior you proposed. To test it, >>>> I artificially set refcount to 2 and indeed PHYSDEVOP_manage_pci_remove >>>> returned -EBUSY, which propagated to the linux driver. Problem is that >>>> Linux driver can't do anything with this. It just displayed an error >>>> message and removed device anyways. This is because Linux sends >>>> PHYSDEVOP_manage_pci_remove in device_remove() call path and there is no >>>> way to prevent the device removal. So, admin is not capable to try this >>>> again. >>> >>> Ideally Linux won't remove the device, and then the admin would get to >>> retry. Maybe the way the Linux hook is placed is not the best one? >>> The hypervisor should be authoritative on whether a device can be >>> removed or not, and hence PHYSDEVOP_manage_pci_remove returning an >>> error (EBUSY or otherwise) shouldn't allow the device unplug in Linux >>> to continue. >> >> Yes, it would be ideally, but Linux driver/device model is written in a >> such way, that PCI subsystem tracks all the PCI device usage, so it can >> be certain that it can remove the device. Thus, functions in the device >> removal path either return void or 0. Of course, kernel does not know that >> hypervisor has additional uses for the device, so there is no mechanisms >> to prevent removal. > > Could pciback obtain a reference on behalf of the hypervisor, dropping it > when device removal is requested (i.e. much closer to the start of that > operation), and only if it finds that no guests use the device anymore? Yes, it can, it this indeed will hold a reference to a pci device for a time, but there are some consideration that made this approach not feasible. Basically, when an user writes to /sys/bus/pci/SBDF/remove, the following happens: 1. /sys/bus/pci/SBFD/remove entry is removed - we can't retry the operation anymore [unimportant things] N. pci_stop_dev() function is called. This function unloads a device driver. Any good behaving driver should drop all additional references to a device at this point. [more unimportant things] M. PCI subsystem drops own reference to a generic device object So, as you can see, admin can't restart a "failed" attempt to remove a PCI device. On other hand, remove() function can sleep. This allows us to pause removal process a bit and check if hypervisor had finished removing a PCI device on its side. But, as you pointed out, this can take weeks... -- WBR, Volodymyr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |