[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign



On 10.02.2022 09:21, Oleksandr Andrushchenko wrote:
> 
> 
> On 08.02.22 13:25, Oleksandr Andrushchenko wrote:
>>
>> On 08.02.22 13:00, Jan Beulich wrote:
>>> On 08.02.2022 11:52, Oleksandr Andrushchenko wrote:
>>>> This smells like we first need to fix the existing code, so
>>>> pdev->domain is not assigned by specific IOMMU implementations,
>>>> but instead controlled by the code which relies on that, assign_device.
>>> Feel free to come up with proposals how to cleanly do so. Moving the
>>> assignment to pdev->domain may even be possible now, but if you go
>>> back you may find that the code was quite different earlier on.
>> I do understand that as the code evolves new use cases bring
>> new issues.
>>>> I can have something like:
>>>>
>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>>> index 88836aab6baf..cc7790709a50 100644
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
>>>>     static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, 
>>>> u32 flag)
>>>>     {
>>>>         const struct domain_iommu *hd = dom_iommu(d);
>>>> +    struct domain *old_owner;
>>>>         struct pci_dev *pdev;
>>>>         int rc = 0;
>>>>
>>>> @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, 
>>>> u8 bus, u8 devfn, u32 flag)
>>>>         ASSERT(pdev && (pdev->domain == hardware_domain ||
>>>>                         pdev->domain == dom_io));
>>>>
>>>> +    /* We need to restore the old owner in case of an error. */
>>>> +    old_owner = pdev->domain;
>>>> +
>>>>         vpci_deassign_device(pdev->domain, pdev);
>>>>
>>>>         rc = pdev_msix_assign(d, pdev);
>>>> @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, 
>>>> u8 bus, u8 devfn, u32 flag)
>>>>
>>>>      done:
>>>>         if ( rc )
>>>> +    {
>>>>             printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
>>>>                    d, &PCI_SBDF3(seg, bus, devfn), rc);
>>>> +        /* We failed to assign, so restore the previous owner. */
>>>> +        pdev->domain = old_owner;
>>>> +    }
>>>>         /* The device is assigned to dom_io so mark it as quarantined */
>>>>         else if ( d == dom_io )
>>>>             pdev->quarantine = true;
>>>>
>>>> But I do not think this belongs to this patch
>>> Indeed. Plus I'm sure you understand that it's not that simple. Assigning
>>> to pdev->domain is only the last step of assignment. Restoring the original
>>> owner would entail putting in place the original IOMMU table entries as
>>> well, which in turn can fail. Hence why you'll find a number of uses of
>>> domain_crash() in places where rolling back is far from easy.
>> So, why don't we just rely on the toolstack to do the roll back then?
>> This way we won't add new domain_crash() calls.
>> I do understand though that we may live Xen in a wrong state though.
>> So, do you think it is possible if we just call deassign_device from
>> assign_device on the error path? This is just like I do in 
>> vpci_assign_device:
>> I call vpci_deassign_device if the former fails.
> With the following addition:
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index c4ae22aeefcd..d6c00449193c 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1511,6 +1511,12 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>       }
> 
>       rc = vpci_assign_device(pdev);
> +    if ( rc )
> +        /*
> +         * Ignore the return code as we want to preserve the one from the
> +         * failed assign operation.
> +         */
> +        deassign_device(d, seg, bus, devfn);
> 
>    done:
>       if ( rc )
> 
> I see the following logs (PV Dom0):
> 
> (XEN) assign_device seg 0 bus 3 devfn 0
> (XEN) [VT-D]d[IO]:PCIe: unmap 0000:03:00.0
> (XEN) [VT-D]d4:PCIe: map 0000:03:00.0
> (XEN) assign_device vpci_assign rc -22 from d[IO] to d4
> (XEN) deassign_device current d4 to d[IO]
> (XEN) [VT-D]d4:PCIe: unmap 0000:03:00.0
> (XEN) [VT-D]d[IO]:PCIe: map 0000:03:00.0
> (XEN) deassign_device ret 0
> (XEN) d4: assign (0000:03:00.0) failed (-22)
> libxl: error: libxl_pci.c:1498:pci_add_dm_done: Domain 4:xc_assign_device 
> failed: Invalid argument
> libxl: error: libxl_pci.c:1781:device_pci_add_done: Domain 
> 4:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
> libxl: error: libxl_create.c:1895:domcreate_attach_devices: Domain 4:unable 
> to add pci devices
> libxl: error: libxl_domain.c:1183:libxl__destroy_domid: Domain 4:Non-existant 
> domain
> libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 4:Unable to 
> destroy guest
> libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 4:Destruction of 
> domain failed
> 
> So, it seems to properly solve the issue with pdev->domain left
> set to the domain we couldn't create.
> 
> @Jan, will this address your concern?

Partly: For one I'd have to think through what further implications there
are from going this route. And then completely ignoring the return value
is unlikely to be correct: You certainly want to retain the original
error code for returning to the caller, but you can't leave the error
unhandled. That's likely another case where the "best" choice is to crash
the guest.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.