|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
> On December 25 2015, 11:06 AM, <Tian, Kevin> wrote:
> > From: Xu, Quan
> > Sent: Wednesday, December 23, 2015 4:26 PM
> >
> > Now if IOTLB/Context/IETC flush is timeout, panic hypervisor.
> > The coming patch set will fix it.
>
> again, please adjust comment to reflect what this patch is doing, i.e. only
> about
> improvement on Device-TLB flush.
>
Agreed.
> >
> > If Device-TLB flush is timeout, we'll hide the target ATS device and
> > crash the domain owning this ATS device.
> >
> > If impacted domain is hardware domain, just throw out a warning.
> >
> > The hided Device will be disallowed to be further assigned to any
> > domain.
>
> hided Device -> hidden device
>
Agreed.
[...]
> > diff --git a/xen/drivers/passthrough/vtd/qinval.c
> > b/xen/drivers/passthrough/vtd/qinval.c
> > index b227e4e..7330c5d 100644
> > --- a/xen/drivers/passthrough/vtd/qinval.c
> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu
> > *iommu, static int invalidate_sync(struct iommu *iommu) {
> > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> > + int rc;
> >
> > if ( qi_ctrl->qinval_maddr )
> > - return queue_invalidate_wait(iommu, 0, 1, 1);
> > + {
> > + rc = queue_invalidate_wait(iommu, 0, 1, 1);
> > + if ( rc )
> > + {
> > + if ( rc == -ETIMEDOUT )
> > + panic("Queue invalidate wait descriptor was
> timeout.\n");
> > + return rc;
> > + }
> > + }
> > +
>
> why do you still do panic here? w/ PATCH 1/3 you should return rc to caller
> directly, right?
>
This panic is original in queue_invalidate_wait(). Patch 1/3 is just for
Device-TLB flush error ( Actually it covers iotlb flush error).
If I fix VT-d iotlb/context/iec flush error to propagated callers, then I can
remove this panic and return rc to propagated caller directly.
> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> > + u16 seg, u8 bus, u8 devfn)
> {
> > + struct domain *d;
> > + struct pci_dev *pdev;
> > +
> > + d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > + ASSERT(d);
> > + for_each_pdev(d, pdev)
> > + {
> > + if ( (pdev->seg == seg) &&
> > + (pdev->bus == bus) &&
> > + (pdev->devfn == devfn) )
> > + {
> > + if ( pdev->domain )
> > + {
> > + list_del(&pdev->domain_list);
> > + pdev->domain = NULL;
> > + }
> > +
> > + if ( pci_hide_device(bus, devfn) )
> > + {
> > + printk(XENLOG_ERR
> > + "IOMMU hide device %04x:%02x:%02x error.",
> > + seg, bus, devfn);
> > + break;
> > + }
>
> I'm not sure whether using pci_hide_device is enough or not break other code
> path here? For example, now you have errors propagated to callers.
More information, after call pci_hide_device(),
..
pdev->domain = dom_xen;
list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
..
Hidden PCI devices are associated with 'dom_xen'.
Now I found it may not clear rmrr mapping / context mapping. .i.e
iommu_remove_device() -- [hd->platform_ops is NULL for dom_xen].
But I think it is acceptable, IMO dom_xen is a part hypervisor, and there
mappings would not a security issue.
Or I can remove these mappings before to hide this device.
So I think it will not break other code break other code.
> What's the
> final policy against flush error?
>
If Device-TLB flush is timeout, we'll hide the target ATS device and crash the
domain owning this ATS device.
If impacted domain is hardware domain, just throw out a warning, instead of
crash the hardware domain.
The hided Device will be disallowed to be further assigned to any domain.
>If they will finally go to cleanup some more state
> relying on pdev->domain, then that code path might be broken. If it's fine to
> do
> cleanup at this point, then just hidden is not enough. If you look at
> pci_remove_device, there's quite some state to be further cleaned up...
>
As the pdev->domain is 'dom_xen';
So for this case, it is still working. Correct me if it is not correct.
BTW, a quick question, which case to call pci_remove_device()?
> I didn't think about the full logic thoroughly now. But it would always be
> good
> to not hide device now until a point where all related states have been
> cleaned
> up in error handling path chained up.
>
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |