[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device



On 06/01/11 14:56, Jan Beulich wrote:
On 01.06.11 at 12:05, Laszlo Ersek<lersek@xxxxxxxxxx>  wrote:

domU: igbvf_shutdown()
         igbvf_suspend()
           pci_disable_device()
             pcibios_disable_device()
               pcibios_disable_resources()
                 pci_write_config_word(
                     dev, PCI_COMMAND,
                     orig&  ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
                 )
                   ...
                     pcifront_bus_write()
                       do_pci_op(XEN_PCI_OP_conf_write)
dom0:                   pciback_do_op()
                           pciback_config_write()
                             conf_space_write()
                               command_write() [via PCI_COMMAND funcptr]
                                 pci_disable_device()
                                   disable_msi_mode()
                                     dev->msix_enabled = 0;

The final assignment above precludes c/s 1070 from doing the job.

That seems to be a problem in the PCI subsystem, in that
disable_msi_mode() is being used from pci_disable_device() (instead
of pci_disable_msi{,x}()), and does not seem to be done in a similarly
bad way in newer kernels.

But does linux-2.6.18-xen avoid the problem? I think its pci_disable_device() still calls disable_msi_mode(), and the latter also only read/writes config words.

So I wonder whether you shouldn't fix
pci_disable_device() instead, or alternatively move the vector
de-allocation (and then for MSI and MSI-X) into disable_msi_mode().

Yes, I did think of that, however IMO that would introduce exactly the problem that you describe below. If either pci_disable_device() or disable_msi_mode() frees the vector, then re-enabling the device can face yet another stumbling block.

I liken this a bit to the UNIX(R) signals -- the allocation/mapping of the MSI-X vectors is the "signal disposition" (= signal action, signal delivery), while the PCI dev's configuration -- whether it's allowed to generate such interrupts -- is almost like the "signal mask". You can have the vectors allocated / mapped and the device can still be told not to generate those interrupts.

In that sense dev->msix_enabled has a split personality (mixed responsibilities).


While the approach you take covers the guest shutdown/restart
case, what if the guest called its pci_disable_device() at runtime,
expecting to be able to call pci_enable_{device,msi,msix}() on it
again later on? Your newly added function would also be called here,

May I ask how? The function is only called from pciback_reset_device(), which in turn is called from drivers/xen/pciback/pci_stub.c,

  pcistub_device_release
  pcistub_put_pci_dev         EXTERN
  pcistub_init_device
  pcistub_device_put
  pcistub_init_devices_late
  pcistub_seize
  pcistub_device_get_pci_dev  EXTERN
  pcistub_remove              DRIVEROP
  permissive_add              EXTERN DRIVER_ATTR(permissive)
  pciback_init                MODULE_INIT
  pcistub_probe               DRIVEROP
  pcistub_get_pci_dev_by_slot EXTERN
  pcistub_get_pci_dev         EXTERN

I did a cursory search for '\<pcistub_', and it seems to me all these functions are only called from functions like

  pciback_release_pci_dev (passthrough.c, slot.c and vpci.c)
  pciback_release_devices (passthrough.c, slot.c and vpci.c)

In my understanding we have no problems reported about the insertion/removal of the pciback module, or device (de)assignment to / from it.

but in this situation you would have to call into the hypervisor (as
the domain is still alive), i.e. you could as well call
msi_remove_pci_irq_vectors().

I considered taking c/s 1070 and simply removing the ifs around the pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by msi_unmap_pirq(), could find no owner domain for the device and return DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then msi_unmap_pirq() would try to unmap a PIRQ for dom0.


Additionally, while covering the MSI-X case, I don't see how the
similar already-mapped case would be cleanly handled for MSI.

The target is "cleanup after shutdown in such a way that it doesn't hurt in other cases either", so:

- it is assumed the hypervisor takes care of the mappings when the domain goes away,

- dom0 has no filtering list for MSI. msi_capability_init() "simply" asks the hypervisor for a vector, while msix_capability_init() "gets in the way".

The sole purpose of the patch is to trim the MSI-X "filtering" list (msi_dev_head) after the domain is gone. I'm not bold enough to yank out the filtering altogether, even though I think it only does damage wrt. reboots -- it must have been originally meant to catch the case when the *same* guest tries to ask for a set of MSI-X entries for a device, in such a way that the requested set is not distinct from entries requested previously for the same device.

The fact that the domid of the device-owning domain is not saved anywhere in the data structure (... at least in RHEL-5) seems to confirm this -- the current implementation has no way to detect that the owning domain has changed; I think it's even oblivious to this possiblity.

I'm looking for the best (... least wrong) location to place the cleanup at; c/s 1070 suggested pciback_reset_device().

Thanks!
lacos

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.