[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.



On March 18, 2016 6:20pm, <JBeulich@xxxxxxxx> wrote:
> >>> On 17.03.16 at 07:54, <quan.xu@xxxxxxxxx> wrote:
> > --- a/xen/drivers/passthrough/amd/iommu_init.c
> > +++ b/xen/drivers/passthrough/amd/iommu_init.c
> > @@ -1339,12 +1339,14 @@ static void invalidate_all_devices(void)
> >      iterate_ivrs_mappings(_invalidate_all_devices);
> >  }
> >
> > -void amd_iommu_suspend(void)
> > +int amd_iommu_suspend(void)
> >  {
> >      struct amd_iommu *iommu;
> >
> >      for_each_amd_iommu ( iommu )
> >          disable_iommu(iommu);
> > +
> > +    return 0;
> >  }
> >
> >  void amd_iommu_resume(void)
> > @@ -1368,3 +1370,11 @@ void amd_iommu_resume(void)
> >          invalidate_all_domain_pages();
> >      }
> >  }
> > +
> > +void amd_iommu_crash_shutdown(void)
> > +{
> > +    struct amd_iommu *iommu;
> > +
> > +    for_each_amd_iommu ( iommu )
> > +        disable_iommu(iommu);
> > +}
> 
> One of the two should clearly call the other - no need to have the same code
> twice.
> 

Good idea.

> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -182,7 +182,11 @@ void __hwdom_init iommu_hwdom_init(struct
> domain *d)
> >                   ((page->u.inuse.type_info & PGT_type_mask)
> >                    == PGT_writable_page) )
> >                  mapping |= IOMMUF_writable;
> > -            hd->platform_ops->map_page(d, gfn, mfn, mapping);
> > +            if ( hd->platform_ops->map_page(d, gfn, mfn, mapping) )
> > +                printk(XENLOG_G_ERR
> > +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx)
> failed.\n",
> > +                       gfn, mfn);
> > +
> 
> Printing one message here is certainly necessary, but what if the failure 
> repeats
> for very many pages? 

Yes, to me, it is ok, but I am open to your suggestion.

> Also %#lx instead of 0x%lx please, and a blank before the
> opening parenthesis.
> 
OK, just check it:

..
"IOMMU: Map page gfn: %#lx (mfn: %#lx) failed.\n"
..

Right?


> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
> >          iommu = drhd->iommu;
> >          iommu_flush_context_global(iommu, 0);
> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> > +        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> > +
> > +        if ( rc > 0 )
> > +        {
> > +            iommu_flush_write_buffer(iommu);
> 
> Why is this needed all of the sudden?

As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my machine, and I 
can find the following log message:
"""
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
"""
__iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be called to 
flush every IOMMU.



> (Note that if you did a more fine grained
> split, it might also be easier for you to note/ explain all the not directly 
> related
> changes in the respective commit messages. Unless of course they fix actual
> bugs, in which case they should be split out anyway; such individual fixes 
> would
> also likely have a much faster route to commit, relieving you earlier from the
> burden of at least some of the changes you have to carry and re-base.)
> 
> > +            rc = 0;
> > +        }
> > +        else if ( rc < 0 )
> > +        {
> > +            printk(XENLOG_G_ERR "IOMMU: IOMMU flush all failed.\n");
> > +            break;
> > +        }
> 
> Is a log message really advisable here?
> 

To me, It looks tricky too. I was struggling to make decision. For scheme B, I 
would try to do as below:

if ( iommu_flush_all() )
    printk("... nnn ...");

but there are 4 function calls, if so, to me, it looks redundant.

Or, could I ignore the print out for iommu_flush_all() failed?



> > -static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long
> > gfn,
> > +static int __intel_iommu_iotlb_flush(struct domain *d, unsigned long
> > +gfn,
> 
> While I'm not VT-d maintainer, I think changes like this would be a good
> opportunity to also drop the stray double underscores: You need to touch all
> callers anyway.
> 

I think this is optional.


> > @@ -584,37 +599,40 @@ static void __intel_iommu_iotlb_flush(struct
> > domain *d, unsigned long gfn,
> >              continue;
> >
> >          if ( page_count != 1 || gfn == INVALID_GFN )
> > -        {
> > -            if ( iommu_flush_iotlb_dsi(iommu, iommu_domid,
> > -                        0, flush_dev_iotlb) )
> > -                iommu_flush_write_buffer(iommu);
> > -        }
> > +            rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
> > +                                       0, flush_dev_iotlb);
> >          else
> > +            rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
> > +                                       (paddr_t)gfn <<
> PAGE_SHIFT_4K, 0,
> > +                                       !dma_old_pte_present,
> > +                                       flush_dev_iotlb);
> > +        if ( rc > 0 )
> >          {
> > -            if ( iommu_flush_iotlb_psi(iommu, iommu_domid,
> > -                        (paddr_t)gfn << PAGE_SHIFT_4K,
> PAGE_ORDER_4K,
> 
> Note how this used PAGE_ORDER_4K so far?

Sorry, this is a rebasing mistake.

> 
> > -                        !dma_old_pte_present, flush_dev_iotlb) )
> > -                iommu_flush_write_buffer(iommu);
> > +            iommu_flush_write_buffer(iommu);
> 
> Same question again: Why is this all of the sudden needed on both paths?
> 

The same as above question. Hold on first.


> > @@ -622,7 +640,7 @@ static void dma_pte_clear_one(struct domain
> *domain, u64 addr)
> >      if ( pg_maddr == 0 )
> >      {
> >          spin_unlock(&hd->arch.mapping_lock);
> > -        return;
> > +        return -ENOMEM;
> >      }
> 
> addr_to_dma_page_maddr() gets called with "alloc" being false, so there can't
> be any memory allocation failure here. There simply is nothing to do in this
> case.
> 

I copy it from iommu_map_page().

Good, then the error of iommu_unmap_page() looks only from flush (the crash is 
at least obvious), then error handling can be lighter weight--
We may return an error, but don't roll back the failed operation.
Right?

> > -void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
> > +int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
> >  {
> >      u32 id;
> > +    int rc = 0;
> >
> >      id = pci_conf_read32(0, 0, 0, 0, 0);
> >      if ( IS_CTG(id) )
> >      {
> >          /* quit if ME does not exist */
> >          if ( pci_conf_read32(0, 0, 3, 0, 0) == 0xffffffff )
> > -            return;
> > +            return -ENOENT;
> 
> Is this really an error? IOW, do all systems which satisfy IS_CTG() have such 
> a
> device?
> 
To be honest, I didn't know much about me_wifi_quirk.
Now, IMO I don't need to deal with me_wifi_quirk().

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.