|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Patch v11 3/3] vt-d: fix vt-d Device-TLB flush timeout issue
On June 17, 2016 3:01 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >>> On 17.06.16 at 08:08, <quan.xu@xxxxxxxxx> wrote:
>
> > On June 16, 2016 5:05 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >> >>> On 16.06.16 at 10:42, <quan.xu@xxxxxxxxx> wrote:
> >> > On June 02, 2016 7:07 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >> >> >>> On 01.06.16 at 11:05, <quan.xu@xxxxxxxxx> wrote:
> >> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16
> did,
> >> >> > + struct pci_ats_dev
> >> >> > +*ats_dev) {
> >> >> > + struct domain *d = NULL;
> >> >> > + struct pci_dev *pdev;
> >> >> > +
> >> >> > + if ( test_bit(did, iommu->domid_bitmap) )
> >> >> > + d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >> >> > +
> >> >> > + /*
> >> >> > + * In case the domain has been freed or the IOMMU domid bitmap
> is
> >> >> > + * not valid, the device no longer belongs to this domain.
> >> >> > + */
> >> >> > + if ( d == NULL )
> >> >> > + return;
> >> >> > +
> >> >> > + pcidevs_lock();
> >> >> > +
> >> >> > + for_each_pdev(d, pdev)
> >> >> > + {
> >> >> > + if ( (pdev->seg == ats_dev->seg) &&
> >> >> > + (pdev->bus == ats_dev->bus) &&
> >> >> > + (pdev->devfn == ats_dev->devfn) )
> >> >> > + {
> >> >> > + ASSERT(pdev->domain);
> >> >> > + list_del(&pdev->domain_list);
> >> >> > + pdev->domain = NULL;
> >> >> > + pci_hide_existing_device(pdev);
> >> >> > + break;
> >> >> > + }
> >> >> > + }
> >> >> > +
> >> >> > + pcidevs_unlock();
> >> >>
> >> >> ... this loop (and locking). (Of course such a change may better
> >> >> be done in another preparatory patch.)
> >> >>
> >> >
> >> > To eliminate the locking? I am afraid the locking is still a must
> >> > here even without the loop, also referring to device_assigned()..
> >>
> >> If the entire loop gets eliminated, what would be left is
> >>
> >> pcidevs_lock();
> >> pcidevs_unlock();
> >>
> >> which I don't think does any good.
> >
> > Why? I can't follow it..
>
> I don't understand your question. Can you explain what use above code
> sequence is, in your opinion? Or else - what does the "why"
> refer to?
>
Ah, there may be a gap between us. without this loop, these pdev operation
should be still there, such as,
+ ASSERT(pdev->domain);
+ list_del(&pdev->domain_list);
+ pdev->domain = NULL;
+ pci_hide_existing_device(pdev);
So, the left is not only:
pcidevs_lock();
pcidevs_unlock();
> >> >> > +static int __must_check dev_invalidate_sync(struct iommu
> >> >> > +*iommu,
> >> >> > +u16
> >> >> did,
> >> >> > + struct pci_ats_dev
> >> >> > +*ats_dev) {
> >> >> > + struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> >> >> > + int rc = 0;
> >> >> > +
> >> >> > + if ( qi_ctrl->qinval_maddr )
> >> >> > + {
> >> >> > + rc = queue_invalidate_wait(iommu, 0, 1, 1, 1);
> >> >> > +
> >> >> > + if ( rc == -ETIMEDOUT )
> >> >> > + dev_invalidate_iotlb_timeout(iommu, did, ats_dev);
> >> >> > + }
> >> >> > +
> >> >> > + return rc;
> >> >> > +}
> >> >>
> >> >> I've never really understood why invalidate_sync() returns success
> >> >> when it didn't do anything. Now that you copy this same behavior
> >> >> here, I really need to ask you to explain that.
> >> >>
> >> >
> >> > It is acceptable to me, returning success when it didn't do
> >> > anything
> >> > -- this is worth reflection and criticism:(..
> >> > It is better:
> >> > + if ( qi_ctrl->qinval_maddr )
> >> > + ...
> >> > + else
> >> > + rc = -ENOENT;
> >>
> >> Right. And perhaps a separate patch to make invalidate_sync() do the
> same.
> >
> > Agreed.
> >
> >> Question is whether this really ought to be a conditional, or whether
> > instead
> >> this code is unreachable when qinval is not in use, in which case
> >> these conditionals would imo better be converted to ASSERT()s.
> >>
> >
> > IMO, this should be a conditional.
> > As mentioned below, strictly speaking, this is a bug. We can't
> > ASSERT() based on a bug..
> > A coming patch may fix it..
>
> And again I don't understand: ASSERT()s are to verify assumed state. If static
> code analysis resulted in understanding a function is unreachable when
> qi_ctrl->qinval_maddr is zero (because qinval ought to have got disabled if
> any
> of the table setup failed), then adding ASSERT() would (a) document that and
> (b) allow to know quickly if something broke that assumption.
>
You are right. A separate patch does this.
> But then again I may simply misunderstand your wording: "We can't ASSERT()
> based on a bug" is really pretty unclear to me.
>
I supposed the variable in ASSERT() is always true, but disable_qinval() needs
to make
qi_ctrl->qinval_maddr zero, but today it doesn't do this -- a bug.
With your explanation, I got it now. Thanks.
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |