[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal




On 18.11.21 15:25, Jan Beulich wrote:
> On 18.11.2021 10:32, Oleksandr Andrushchenko wrote:
>>
>> On 18.11.21 11:15, Jan Beulich wrote:
>>> On 18.11.2021 09:54, Oleksandr Andrushchenko wrote:
>>>> On 18.11.21 10:36, Jan Beulich wrote:
>>>>> On 18.11.2021 08:49, Oleksandr Andrushchenko wrote:
>>>>>> On 17.11.21 10:28, Jan Beulich wrote:
>>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>>
>>>>>>>> When a vPCI is removed for a PCI device it is possible that we have
>>>>>>>> scheduled a delayed work for map/unmap operations for that device.
>>>>>>>> For example, the following scenario can illustrate the problem:
>>>>>>>>
>>>>>>>> pci_physdev_op
>>>>>>>>        pci_add_device
>>>>>>>>            init_bars -> modify_bars -> defer_map -> 
>>>>>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>>>>>        iommu_add_device <- FAILS
>>>>>>>>        vpci_remove_device -> xfree(pdev->vpci)
>>>>>>>>
>>>>>>>> leave_hypervisor_to_guest
>>>>>>>>        vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci 
>>>>>>>> == NULL
>>>>>>>>
>>>>>>>> For the hardware domain we continue execution as the worse that
>>>>>>>> could happen is that MMIO mappings are left in place when the
>>>>>>>> device has been deassigned
>>>>>>>>
>>>>>>>> For unprivileged domains that get a failure in the middle of a vPCI
>>>>>>>> {un}map operation we need to destroy them, as we don't know in which
>>>>>>>> state the p2m is. This can only happen in vpci_process_pending for
>>>>>>>> DomUs as they won't be allowed to call pci_add_device.
>>>>>>>>
>>>>>>>> Signed-off-by: Oleksandr Andrushchenko 
>>>>>>>> <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>> Thinking about it some more, I'm not convinced any of this is really
>>>>>>> needed in the presented form.
>>>>>> The intention of this patch was to handle error conditions which are
>>>>>> abnormal, e.g. when iommu_add_device fails and we are in the middle
>>>>>> of initialization. So, I am trying to cancel all pending work which might
>>>>>> already be there and not to crash.
>>>>> Only Dom0 may be able to prematurely access the device during "add".
>>>>> Yet unlike for DomU-s we generally expect Dom0 to be well-behaved.
>>>>> Hence I'm not sure I see the need for dealing with these.
>>>> Probably I don't follow you here. The issue I am facing is Dom0
>>>> related, e.g. Xen was not able to initialize during "add" and thus
>>>> wanted to clean up the leftovers. As the result the already
>>>> scheduled work crashes as it was not neither canceled nor interrupted
>>>> in some safe manner. So, this sounds like something we need to take
>>>> care of, thus this patch.
>>> But my point was the question of why there would be any pending work
>>> in the first place in this case, when we expect Dom0 to be well-behaved.
>> I am not saying Dom0 misbehaves here. This is my real use-case
>> (as in the commit message):
>>
>> pci_physdev_op
>>        pci_add_device
>>            init_bars -> modify_bars -> defer_map -> 
>> raise_softirq(SCHEDULE_SOFTIRQ)
>>        iommu_add_device <- FAILS
>>        vpci_remove_device -> xfree(pdev->vpci)
>>
>> leave_hypervisor_to_guest
>>        vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
> First of all I'm sorry for having lost track of that particular case in
> the course of the discussion.
No problem, I see on the ML how much you review every day...
>
> I wonder though whether that's something we really need to take care of.
> At boot (on x86) modify_bars() wouldn't call defer_map() anyway, but
> use apply_map() instead. I wonder whether this wouldn't be appropriate
> generally in the context of init_bars() when used for Dom0 (not sure
> whether init_bars() would find some form of use for DomU-s as well).
> This is even more so as it would better be the exception that devices
> discovered post-boot start out with memory decoding enabled (which is a
> prereq for modify_bars() to be called in the first place).
Well, first of all init_bars is going to be used for DomUs as well:
that was discussed previously and is reflected in this series.

But the real use-case for the deferred mapping would be the one
from PCI_COMMAND register write emulation:

void vpci_cmd_write(const struct pci_dev *pdev, unsigned int reg,
                     uint32_t cmd, void *data)
{
[snip]
         modify_bars(pdev, cmd, false);

which in turn calls defer_map.

I believe Roger did that for a good reason not to stall Xen while doing
map/unmap (which may take quite some time) and moved that to
vpci_process_pending and SOFTIRQ context.
>
> Jan
>
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.