[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v12 6/6] vt-d: fix vt-d Device-TLB flush timeout issue



On June 27, 2016 11:21 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >>> On 27.06.16 at 14:56, <quan.xu@xxxxxxxxx> wrote:
> > On June 27, 2016 4:24 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >> >>> On 24.06.16 at 07:51, <quan.xu@xxxxxxxxx> wrote:
> >> > @@ -199,24 +199,73 @@ static int __must_check
> >> queue_invalidate_wait(struct iommu *iommu,
> >> >      return -EOPNOTSUPP;
> >> >  }
> >> >
> >> > -static int __must_check invalidate_sync(struct iommu *iommu,
> >> > -                                        bool_t flush_dev_iotlb)
> >> > +static int __must_check invalidate_sync(struct iommu *iommu)
> >> >  {
> >> >      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> >> >
> >> >      ASSERT(qi_ctrl->qinval_maddr);
> >> >
> >> > -    return queue_invalidate_wait(iommu, 0, 1, 1, flush_dev_iotlb);
> >> > +    return queue_invalidate_wait(iommu, 0, 1, 1, 0); }
> >> > +
> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> >> > +                                         struct pci_dev *pdev) {
> >> > +    struct domain *d = NULL;
> >> > +
> >> > +    if ( test_bit(did, iommu->domid_bitmap) )
> >> > +        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >> > +
> >> > +    /*
> >> > +     * In case the domain has been freed or the IOMMU domid bitmap is
> >> > +     * not valid, the device no longer belongs to this domain.
> >> > +     */
> >> > +    if ( d == NULL )
> >> > +        return;
> >> > +
> >> > +    pcidevs_lock();
> >> > +    ASSERT(pdev->domain);
> >> > +    list_del(&pdev->domain_list);
> >> > +    pdev->domain = NULL;
> >> > +    pci_hide_existing_device(pdev);
> >> > +    pcidevs_unlock();
> >> > +
> >> > +    if ( !d->is_shutting_down && printk_ratelimit() )
> >> > +        printk(XENLOG_WARNING VTDPREFIX
> >> > +               " dom%d: ATS device %04x:%02x:%02x.%u flush failed\n",
> >> > +               d->domain_id, pdev->seg, pdev->bus,
> >> > +               PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> >> > +
> >> > +    if ( !is_hardware_domain(d) )
> >> > +        domain_crash(d);
> >> > +
> >> > +    rcu_unlock_domain(d);
> >> > +}
> >>
> >> So in an earlier patch in this series you (supposedly) moved similar
> >> logic up to the vendor independent layer. I think this then would
> >> better get moved up too, if at all possible.
> >>
> >
> > To be honest, I have not much reason for leaving domain crash here and
> > I was aware of this problem, but crash_domain() here is not harmful
> > (as the 'd->is_shutting_down' is Set when to crash, and once the 'd-
> >is_shutting_down'
> > is Set then return  in domain_shutdown()  ).
> > In case crash domain directly, it may help us narrow down the 'window'
> > (the domain is still running)..
> >
> > To me, moving the logic up is acceptable.
> >
> > In next version, could I only drop:
> >
> > +    if ( !is_hardware_domain(d) )
> > +        domain_crash(d);
> >
> > In this patch, and leave the rest as is ?
> 
> Not really - the entire function looks like it could move out of vtd/, as I 
> can't
> see anything VT-d specific in it.
> 

Yes, it could be out of vtd, and then benefit arm/amd  IOMMU to hide ATS device.

But 'did'  and 'iommu->domid_bitmap' are really vtd specific. Both of them are 
to get domain* structure, not a big deal, and then I can use domain_id instead.

IMO, the domain* structure is a must here,
As mentioned, not all of call trees of device iotlb flush are under 
pcidevs_lock, (.i.e  ...--iommu_iotlb_flush()-- xenmem_add_to_physmap()... )
In extreme cases, the domain may has been freed or the device may has been 
detached or even attached to another domain ( I also need to add 'if 
(pdev->domain == d )' before to hide device).
the domain* structure can help us check above cases.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.