[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 03/10] IOMMU/MMU: enhance the call trees of IOMMU unmapping and mapping



On May 04, 2016 9:49 PM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
> On Fri, Apr 29, 2016 at 10:25 AM, Quan Xu <quan.xu@xxxxxxxxx> wrote:
> > When IOMMU mapping is failed, we issue a best effort rollback,
> > stopping IOMMU mapping, unmapping the previous IOMMU maps and
> then
> > reporting the error up to the call trees. When rollback is not
> > feasible (in early initialization phase or trade-off of complexity)
> > for the hardware domain, we do things on a best effort basis, only throwing
> out an error message.
> >
> > IOMMU unmapping should perhaps continue despite an error, in an
> > attempt to do best effort cleanup.
> >
> > Signed-off-by: Quan Xu <quan.xu@xxxxxxxxx>
> >
> > CC: Keir Fraser <keir@xxxxxxx>
> > CC: Jan Beulich <jbeulich@xxxxxxxx>
> > CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > CC: Jun Nakajima <jun.nakajima@xxxxxxxxx>
> > CC: Kevin Tian <kevin.tian@xxxxxxxxx>
> > CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> > ---
> >  xen/arch/x86/mm.c               | 13 ++++++++-----
> >  xen/arch/x86/mm/p2m-ept.c       | 27 +++++++++++++++++++++++++--
> >  xen/arch/x86/mm/p2m-pt.c        | 24 ++++++++++++++++++++----
> >  xen/arch/x86/mm/p2m.c           | 11 +++++++++--
> >  xen/drivers/passthrough/iommu.c | 14 +++++++++++++-
> >  5 files changed, 75 insertions(+), 14 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> > a42097f..427097d 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >                             int preemptible)  {
> >      unsigned long nx, x, y = page->u.inuse.type_info;
> > -    int rc = 0;
> > +    int rc = 0, ret = 0;
> >
> >      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >
> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >          {
> >              if ( (x & PGT_type_mask) == PGT_writable_page )
> > -                iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
> > +                ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >              else if ( type == PGT_writable_page )
> > -                iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
> > -                               page_to_mfn(page),
> > -                               IOMMUF_readable|IOMMUF_writable);
> > +                ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
> > +                                     page_to_mfn(page),
> > +                                     IOMMUF_readable|IOMMUF_writable);
> >          }
> >      }
> >
> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >      if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >          put_page(page);
> >
> > +    if ( !rc )
> > +        rc = ret;
> > +
> >      return rc;
> >  }
> >
> > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> > index 1ed5b47..df87944 100644
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -821,6 +821,8 @@ out:
> >      if ( needs_sync )
> >          ept_sync_domain(p2m);
> >
> > +    ret = 0;
> > +
> >      /* For host p2m, may need to change VT-d page table.*/
> >      if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
> >           need_modify_vtd_table )
> > @@ -831,11 +833,29 @@ out:
> >          {
> >              if ( iommu_flags )
> >                  for ( i = 0; i < (1 << order); i++ )
> > -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
> > iommu_flags);
> > +                {
> > +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
> > iommu_flags);
> > +
> > +                    if ( !ret && unlikely(rc) )
> > +                    {
> > +                        while ( i-- )
> > +                            iommu_unmap_page(d, gfn + i);
> > +
> > +                        ret = rc;
> > +                        break;
> > +                    }
> > +                }
> >              else
> >                  for ( i = 0; i < (1 << order); i++ )
> > -                    iommu_unmap_page(d, gfn + i);
> > +                {
> > +                    rc = iommu_unmap_page(d, gfn + i);
> > +
> > +                    if ( !ret && unlikely(rc) )
> > +                        ret = rc;
> > +                }
> >          }
> > +
> > +        rc = 0;
> >      }
> 
> So the reason for doing this thing with setting ret=0, then using rc,
> then setting rc to zero, is to make sure that the change is propagated
> to the altp2m if necessary?
> 
> I'm not a fan of this sort of "implied signalling".  The check at the
> altp2m conditional is meant to say, "everything went fine, go ahead
> and propagate the change".  But with your changes, that's not what we
> want anymore -- we want, "If the change actually made it into the
> hostp2m, propagate it to the altp2m."
> 
> As such, I think it would be a lot clearer to just make a new boolean
> variable, maybe "entry_written", which we initialize to false and then
> set to true when the entry is actually written; and then change the
> condition re the altp2m to "if ( entry_written && p2m_is_hostp2m(..)
> )".
> 
> Then I think you could make the ret / rc use mirror what you do
> elsewhere, where ret is used to store the return value of the function
> call, and it's propagated to rc only if rc is not already set.
> 

George,

Thanks for your detailed explanation. This seems an another matter of 
preference.
Of course, I'd follow your suggestion in p2m  field, while I need to hear the 
other maintainers' options as well
(especially when it comes from Kevin/Jan, as they do spend a lot of time for 
me).

A little bit of difference here, IMO, an int 'entry_written' is much better, as 
we also need  to propagate the 'entry_written' up to callers,
i.e. the errors returned from atomic_write_ept_entry() are '-EINVAL', '-ESRCH' 
and '-EBUSY'. Then, the follow is my modification (feel free to correct me):



--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -666,7 +666,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
     ept_entry_t *table, *ept_entry = NULL;
     unsigned long gfn_remainder = gfn;
     unsigned int i, target = order / EPT_TABLE_ORDER;
-    int ret, rc = 0;
+    int ret, rc = 0, entry_written = 0;
     bool_t direct_mmio = (p2mt == p2m_mmio_direct);
     uint8_t ipat = 0;
     bool_t need_modify_vtd_table = 1;
@@ -763,8 +763,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,

         /* now install the newly split ept sub-tree */
         /* NB: please make sure domian is paused and no in-fly VT-d DMA. */
-        rc = atomic_write_ept_entry(ept_entry, split_ept_entry, i);
-        ASSERT(rc == 0);
+        entry_written = atomic_write_ept_entry(ept_entry, split_ept_entry, i);
+        ASSERT(entry_written == 0);

         /* then move to the level we want to make real changes */
         for ( ; i > target; i-- )
@@ -809,9 +809,12 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
         new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                     old_entry.suppress_ve : 1;

-    rc = atomic_write_ept_entry(ept_entry, new_entry, target);
-    if ( unlikely(rc) )
+    entry_written = atomic_write_ept_entry(ept_entry, new_entry, target);
+    if ( unlikely(entry_written) )
+    {
         old_entry.epte = 0;
+        rc = entry_written;
+    }
     else if ( p2mt != p2m_invalid &&
               (gfn + (1UL << order) - 1 > p2m->max_mapped_pfn) )
         /* Track the highest gfn for which we have ever had a valid mapping */
@@ -822,7 +825,7 @@ out:
         ept_sync_domain(p2m);

     /* For host p2m, may need to change VT-d page table.*/
-    if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
+    if ( entry_written == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
          need_modify_vtd_table )
     {
         if ( iommu_hap_pt_share )
@@ -831,10 +834,28 @@ out:
         {
             if ( iommu_flags )
                 for ( i = 0; i < (1 << order); i++ )
-                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
+                {
+                    ret = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
iommu_flags);
+
+                    if ( unlikely(ret) )
+                    {
+                        while ( i-- )
+                            iommu_unmap_page(p2m->domain, gfn + i);
+
+                        if ( !rc )
+                            rc = ret;
+
+                        break;
+                    }
+                }
             else
                 for ( i = 0; i < (1 << order); i++ )
-                    iommu_unmap_page(d, gfn + i);
+                {
+                    ret = iommu_unmap_page(d, gfn + i);
+
+                    if ( !rc && unlikely(ret) )
+                        rc = ret;
+                }
         }
     }





> > @@ -680,11 +680,27 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> unsigned long gfn, mfn_t mfn,
> >          }
> >          else if ( iommu_pte_flags )
> >              for ( i = 0; i < (1UL << page_order); i++ )
> > -                iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> > -                               iommu_pte_flags);
> > +            {
> > +                ret = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> > +                                    iommu_pte_flags);
> > +
> > +                if ( !rc && unlikely(ret) )
> > +                {
> > +                    while ( i-- )
> > +                        iommu_unmap_page(p2m->domain, gfn + i);
> > +
> > +                    rc = ret;
> > +                    break;
> > +                }
> 
> Hmm, is this conditional here right?  What the code actually says to
> do is to always map pages, but to only unmap them on an error if there
> have been no other errors in the function so far.
> 
> As it happens, at the moment rc cannot be non-zero here since any time
> it's non-zero the code jumps down to the 'out' label, skipping this.
> But if that ever changed, this would fail to unmap when it probably
> should.
> 
> It seems like this be:
> 
> if ( unlikely(ret) )
> {
>   while ( i-- )
>     iommu_unmap_page(p2m->domain, gfn + i);
>   if ( !rc )
>     rc = ret;
>   break;
> }
> 

Looks good. Thanks.

> Or if you want to assume that rc will never be zero, then put an
> ASSERT() just before the for loop here.
> 
> Everything else looks good to me.  Thanks again for your work on this.
> 

Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.