[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot



On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].
> 
> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> killed/crashed before the domain shutdown, this domain's pirq won't be
> unmapped. Then xen takes over this work, unmapping all pirq-s, when
> destroying guest. But as pciback has already disabled meory decoding before
> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> this process is:
> 
> ->arch_domain_destroy
>     ->free_domain_pirqs
>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>             ->pirq_guest_force_unbind
>                 ->__pirq_guest_unbind
>                     ->mask_msi_irq(=desc->handler->disable())
>                         ->the warning in msi_set_mask_bit()
> 
> The host_maskall bit will prevent guests from clearing the maskall bit
> even the device is assigned to another guest later. Then guests cannot
> receive MSIs from this device.
> 
> To fix this issue, a pirq is unmapped before memory decoding is disabled by
> pciback. Specifically, when a device is detached from a guest, all established
> mappings between pirq and msi are destroying before changing the ownership.
> 
> [1]: 
> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Thanks, I think the approach is fine, just a couple of comments.

> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
> ---
> Changes in v5:
>  - fix the potential infinite loop
>  - assert that unmap_domain_pirq() won't fail
>  - assert msi_list is empty after the loop in pci_unmap_msi
>  - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
>    compilation error when building PVShim
> 
> Changes in v4:
>  - split out change to 'msix->warned' field
>  - handle multiple msi cases
>  - use list_first_entry_or_null to traverse 'pdev->msi_list'
> ---
>  xen/drivers/passthrough/io.c  | 57 ++++++++++++++++++++++++++------------
>  xen/drivers/passthrough/pci.c | 64 
> +++++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/iommu.h       |  4 +++
>  3 files changed, 107 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index a6eb8a4..56ee1ef 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>      return 0;
>  }
>  
> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> +         list_empty(&pirq_dpci->digl_list) )
> +    {
> +        pirq_guest_unbind(d, pirq);
> +        msixtbl_pt_unregister(d, pirq);
> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
> +            kill_timer(&pirq_dpci->timer);
> +        pirq_dpci->flags = 0;
> +        /*
> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> +         * call to pt_pirq_softirq_reset.
> +         */
> +        pt_pirq_softirq_reset(pirq_dpci);
> +
> +        pirq_cleanup_check(pirq, d);
> +    }
> +}
> +
> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);

const

> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
> +        pi_update_irte(NULL, pirq, 0);
> +
> +    pt_irq_destroy_bind_common(d, pirq);
> +}
> +
>  int pt_irq_destroy_bind(
>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>  {
> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>          }
>          else
>              what = "bogus";
> -    }
> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
> -        pi_update_irte(NULL, pirq, 0);
> -
> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> -         list_empty(&pirq_dpci->digl_list) )
> -    {
> -        pirq_guest_unbind(d, pirq);
> -        msixtbl_pt_unregister(d, pirq);
> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
> -            kill_timer(&pirq_dpci->timer);
> -        pirq_dpci->flags = 0;
> -        /*
> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> -         * call to pt_pirq_softirq_reset.
> -         */
> -        pt_pirq_softirq_reset(pirq_dpci);
>  
> -        pirq_cleanup_check(pirq, d);
> +        pt_irq_destroy_bind_common(d, pirq);
>      }
> +    else
> +        pt_irq_destroy_bind_msi(d, pirq);
>  
>      spin_unlock(&d->event_lock);
>  
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 93c20b9..4f2be02 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>      return rc;
>  }
>  
> +/*
> + * Unmap established mappings between domain's pirq and device's MSI.
> + * These mappings were set up by qemu/guest and are expected to be
> + * destroyed when changing the device's ownership.
> + */
> +static void pci_unmap_msi(struct pci_dev *pdev)
> +{
> +    struct msi_desc *entry, *tmp;
> +    struct domain *d = pdev->domain;
> +
> +    ASSERT(pcidevs_locked());
> +    ASSERT(d);
> +
> +    spin_lock(&d->event_lock);
> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
> +    {
> +        struct pirq *info;
> +        int ret, pirq = 0;
> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> +                          ? entry->msi.nvec : 1;

I think you should mask the entry, like it's done in
pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
consistent state between bind and unbind.

> +
> +        while ( nr-- )
> +        {
> +            struct hvm_pirq_dpci *pirq_dpci;

Nit: you could reduce the scope of info by declaring it here AFAICT.

> +
> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
> +            WARN_ON(pirq < 0);
> +            if ( pirq <= 0 )
> +                continue;
> +
> +            info = pirq_info(d, pirq);
> +            if ( !info )
> +                continue;
> +
> +            pirq_dpci = pirq_dpci(info);
> +            if ( pirq_dpci &&
> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
> +                pt_irq_destroy_bind_msi(d, info);
> +        }
> +
> +        if ( pirq > 0 )
> +        {
> +            /*
> +             * The pirq is derived from an entry in msi_list rather than an
> +             * arbitrary value passed down. There should be a irq (msi) 
> mapped
> +             * to this pirq. In this case, unmapping this pirq should 
> succeed.
> +             * Otherwise, something goes wrong.
> +             */
> +            ret = unmap_domain_pirq(d, pirq);
> +            ASSERT(!ret);

unmap_domain_pirq can fail, why not make pci_unmap_msi return an int
and propagate the error to the caller?

deassign_device returning an error should also be fine.

> +        }
> +    }
> +    /*
> +     * All pirq-s should have been unmapped and corresponding msi_desc
> +     * entries should have been removed in the above loop.
> +     */
> +    ASSERT(list_empty(&pdev->msi_list));
> +
> +    spin_unlock(&d->event_lock);
> +}
> +
>  /* caller should hold the pcidevs_lock */
>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>  {
> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, 
> u8 devfn)
>      if ( !pdev )
>          return -ENODEV;
>  
> +    pci_unmap_msi(pdev);

Just want to make sure, since deassign_device will be called for both
PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
device is assigned to a PV guest, but would like your confirmation.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.