[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device

>>> On 01.06.11 at 16:12, Laszlo Ersek <lersek@xxxxxxxxxx> wrote:
> On 06/01/11 14:56, Jan Beulich wrote:
>>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@xxxxxxxxxx>  wrote:
>>> domU: igbvf_shutdown()
>>>          igbvf_suspend()
>>>            pci_disable_device()
>>>              pcibios_disable_device()
>>>                pcibios_disable_resources()
>>>                  pci_write_config_word(
>>>                      dev, PCI_COMMAND,
>>>                      orig&  ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
>>>                  )
>>>                    ...
>>>                      pcifront_bus_write()
>>>                        do_pci_op(XEN_PCI_OP_conf_write)
>>> dom0:                   pciback_do_op()
>>>                            pciback_config_write()
>>>                              conf_space_write()
>>>                                command_write() [via PCI_COMMAND funcptr]
>>>                                  pci_disable_device()
>>>                                    disable_msi_mode()
>>>                                      dev->msix_enabled = 0;
>>> The final assignment above precludes c/s 1070 from doing the job.
>> That seems to be a problem in the PCI subsystem, in that
>> disable_msi_mode() is being used from pci_disable_device() (instead
>> of pci_disable_msi{,x}()), and does not seem to be done in a similarly
>> bad way in newer kernels.
> But does linux-2.6.18-xen avoid the problem? I think its 
> pci_disable_device() still calls disable_msi_mode(), and the latter also 
> only read/writes config words.

Yes, and that's certainly broken (in the native base version, not
particularly in the Xen pieces).

>> So I wonder whether you shouldn't fix
>> pci_disable_device() instead, or alternatively move the vector
>> de-allocation (and then for MSI and MSI-X) into disable_msi_mode().
> Yes, I did think of that, however IMO that would introduce exactly the 
> problem that you describe below. If either pci_disable_device() or 
> disable_msi_mode() frees the vector, then re-enabling the device can 
> face yet another stumbling block.

Why - pci_enable_msi{,x}() call msi{,x}_capability_init(), which
obtains the vectors.

The problem is that calling disable_msi_mode() alone is always a
mistake, this should have been a static function just like its
counterpart, enable_msi_mode().

>> While the approach you take covers the guest shutdown/restart
>> case, what if the guest called its pci_disable_device() at runtime,
>> expecting to be able to call pci_enable_{device,msi,msix}() on it
>> again later on? Your newly added function would also be called here,
> May I ask how? The function is only called from pciback_reset_device(), 

I was just looking at the call tree you provided (and which is still
quoted at the top). Oh - you're referring to Dom0 calling the
function, whereas I wrote about what happens when guest calls
its version of it.

>> but in this situation you would have to call into the hypervisor (as
>> the domain is still alive), i.e. you could as well call
>> msi_remove_pci_irq_vectors().
> I considered taking c/s 1070 and simply removing the ifs around the 
> pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by 
> msi_unmap_pirq(), could find no owner domain for the device and return 
> DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then 
> msi_unmap_pirq() would try to unmap a PIRQ for dom0.

Yeah, when the domain is gone you have this problem. But if you
properly removed the vectors while the domain is still there, you

>> Additionally, while covering the MSI-X case, I don't see how the
>> similar already-mapped case would be cleanly handled for MSI.
> The target is "cleanup after shutdown in such a way that it doesn't hurt 
> in other cases either", so:
> - it is assumed the hypervisor takes care of the mappings when the 
> domain goes away,
> - dom0 has no filtering list for MSI. msi_capability_init() "simply" 
> asks the hypervisor for a vector, while msix_capability_init() "gets in 
> the way".

That's all true, but by considering only your one case you may end
up making already crappy code worse.

> I'm looking for the best (... least wrong) location to place the cleanup 
> at; c/s 1070 suggested pciback_reset_device().

For that situation this was the natural place, but you're dealing
with an alive domain not being cleaned up after properly during
certain operations, and deferring the cleanup until the domain is
gone is only going to call for further problems (as outlined earlier,
and who knows what other cases we didn't consider).

Hence while I realize that from a pragmatic point of view your
approach may seem acceptable, it only fixing a particular aspect
of a more general problem is not really something I like.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.