[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch v11 3/3] vt-d: fix vt-d Device-TLB flush timeout issue



>>> On 17.06.16 at 08:08, <quan.xu@xxxxxxxxx> wrote:

> On June 16, 2016 5:05 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>> >>> On 16.06.16 at 10:42, <quan.xu@xxxxxxxxx> wrote:
>> > On June 02, 2016 7:07 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>> >> >>> On 01.06.16 at 11:05, <quan.xu@xxxxxxxxx> wrote:
>> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
>> >> > +                                         struct pci_ats_dev
>> >> > +*ats_dev) {
>> >> > +    struct domain *d = NULL;
>> >> > +    struct pci_dev *pdev;
>> >> > +
>> >> > +    if ( test_bit(did, iommu->domid_bitmap) )
>> >> > +        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
>> >> > +
>> >> > +    /*
>> >> > +     * In case the domain has been freed or the IOMMU domid bitmap is
>> >> > +     * not valid, the device no longer belongs to this domain.
>> >> > +     */
>> >> > +    if ( d == NULL )
>> >> > +        return;
>> >> > +
>> >> > +    pcidevs_lock();
>> >> > +
>> >> > +    for_each_pdev(d, pdev)
>> >> > +    {
>> >> > +        if ( (pdev->seg == ats_dev->seg) &&
>> >> > +             (pdev->bus == ats_dev->bus) &&
>> >> > +             (pdev->devfn == ats_dev->devfn) )
>> >> > +        {
>> >> > +            ASSERT(pdev->domain);
>> >> > +            list_del(&pdev->domain_list);
>> >> > +            pdev->domain = NULL;
>> >> > +            pci_hide_existing_device(pdev);
>> >> > +            break;
>> >> > +        }
>> >> > +    }
>> >> > +
>> >> > +    pcidevs_unlock();
>> >>
>> >> ... this loop (and locking). (Of course such a change may better be
>> >> done in another preparatory patch.)
>> >>
>> >
>> > To eliminate the locking?  I am afraid the locking is still a must
>> > here even without the loop, also referring  to device_assigned()..
>> 
>> If the entire loop gets eliminated, what would be left is
>> 
>>     pcidevs_lock();
>>     pcidevs_unlock();
>> 
>> which I don't think does any good.
> 
> Why? I can't follow it..

I don't understand your question. Can you explain what use above
code sequence is, in your opinion? Or else - what does the "why"
refer to?

>> >> > +static int __must_check dev_invalidate_sync(struct iommu *iommu,
>> >> > +u16
>> >> did,
>> >> > +                                            struct pci_ats_dev
>> >> > +*ats_dev) {
>> >> > +    struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
>> >> > +    int rc = 0;
>> >> > +
>> >> > +    if ( qi_ctrl->qinval_maddr )
>> >> > +    {
>> >> > +        rc = queue_invalidate_wait(iommu, 0, 1, 1, 1);
>> >> > +
>> >> > +        if ( rc == -ETIMEDOUT )
>> >> > +            dev_invalidate_iotlb_timeout(iommu, did, ats_dev);
>> >> > +    }
>> >> > +
>> >> > +    return rc;
>> >> > +}
>> >>
>> >> I've never really understood why invalidate_sync() returns success
>> >> when it didn't do anything. Now that you copy this same behavior
>> >> here, I really need to ask you to explain that.
>> >>
>> >
>> > It is acceptable to me, returning success when it didn't do anything
>> > -- this is worth reflection and criticism:(..
>> > It is better:
>> > +    if ( qi_ctrl->qinval_maddr )
>> > +        ...
>> > +    else
>> > +        rc = -ENOENT;
>> 
>> Right. And perhaps a separate patch to make invalidate_sync() do the same.
> 
> Agreed.
> 
>> Question is whether this really ought to be a conditional, or whether 
> instead
>> this code is unreachable when qinval is not in use, in which case these
>> conditionals would imo better be converted to ASSERT()s.
>> 
> 
> IMO, this should be a conditional.
> As mentioned below, strictly speaking, this is a bug. We can't  ASSERT() 
> based on a bug..
> A coming patch may fix it..

And again I don't understand: ASSERT()s are to verify assumed
state. If static code analysis resulted in understanding a function
is unreachable when qi_ctrl->qinval_maddr is zero (because
qinval ought to have got disabled if any of the table setup failed),
then adding ASSERT() would (a) document that and (b) allow to
know quickly if something broke that assumption.

But then again I may simply misunderstand your wording: "We
can't ASSERT() based on a bug" is really pretty unclear to me.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.