[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 03/10] IOMMU/MMU: enhance the call trees of IOMMU unmapping and mapping



On 05/05/16 07:53, Xu, Quan wrote:
> On May 04, 2016 9:49 PM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
>> On Fri, Apr 29, 2016 at 10:25 AM, Quan Xu <quan.xu@xxxxxxxxx> wrote:
>>> When IOMMU mapping is failed, we issue a best effort rollback,
>>> stopping IOMMU mapping, unmapping the previous IOMMU maps and
>> then
>>> reporting the error up to the call trees. When rollback is not
>>> feasible (in early initialization phase or trade-off of complexity)
>>> for the hardware domain, we do things on a best effort basis, only throwing
>> out an error message.
>>>
>>> IOMMU unmapping should perhaps continue despite an error, in an
>>> attempt to do best effort cleanup.
>>>
>>> Signed-off-by: Quan Xu <quan.xu@xxxxxxxxx>
>>>
>>> CC: Keir Fraser <keir@xxxxxxx>
>>> CC: Jan Beulich <jbeulich@xxxxxxxx>
>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>> CC: Jun Nakajima <jun.nakajima@xxxxxxxxx>
>>> CC: Kevin Tian <kevin.tian@xxxxxxxxx>
>>> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>>> ---
>>>  xen/arch/x86/mm.c               | 13 ++++++++-----
>>>  xen/arch/x86/mm/p2m-ept.c       | 27 +++++++++++++++++++++++++--
>>>  xen/arch/x86/mm/p2m-pt.c        | 24 ++++++++++++++++++++----
>>>  xen/arch/x86/mm/p2m.c           | 11 +++++++++--
>>>  xen/drivers/passthrough/iommu.c | 14 +++++++++++++-
>>>  5 files changed, 75 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
>>> a42097f..427097d 100644
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
>> *page, unsigned long type,
>>>                             int preemptible)  {
>>>      unsigned long nx, x, y = page->u.inuse.type_info;
>>> -    int rc = 0;
>>> +    int rc = 0, ret = 0;
>>>
>>>      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
>>>
>>> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
>> *page, unsigned long type,
>>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>>          {
>>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>>> -                iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>>> +                ret = iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>>>              else if ( type == PGT_writable_page )
>>> -                iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>>> -                               page_to_mfn(page),
>>> -                               IOMMUF_readable|IOMMUF_writable);
>>> +                ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>>> +                                     page_to_mfn(page),
>>> +                                     IOMMUF_readable|IOMMUF_writable);
>>>          }
>>>      }
>>>
>>> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
>> *page, unsigned long type,
>>>      if ( (x & PGT_partial) && !(nx & PGT_partial) )
>>>          put_page(page);
>>>
>>> +    if ( !rc )
>>> +        rc = ret;
>>> +
>>>      return rc;
>>>  }
>>>
>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>> index 1ed5b47..df87944 100644
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -821,6 +821,8 @@ out:
>>>      if ( needs_sync )
>>>          ept_sync_domain(p2m);
>>>
>>> +    ret = 0;
>>> +
>>>      /* For host p2m, may need to change VT-d page table.*/
>>>      if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
>>>           need_modify_vtd_table )
>>> @@ -831,11 +833,29 @@ out:
>>>          {
>>>              if ( iommu_flags )
>>>                  for ( i = 0; i < (1 << order); i++ )
>>> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
>>> iommu_flags);
>>> +                {
>>> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
>>> iommu_flags);
>>> +
>>> +                    if ( !ret && unlikely(rc) )
>>> +                    {
>>> +                        while ( i-- )
>>> +                            iommu_unmap_page(d, gfn + i);
>>> +
>>> +                        ret = rc;
>>> +                        break;
>>> +                    }
>>> +                }
>>>              else
>>>                  for ( i = 0; i < (1 << order); i++ )
>>> -                    iommu_unmap_page(d, gfn + i);
>>> +                {
>>> +                    rc = iommu_unmap_page(d, gfn + i);
>>> +
>>> +                    if ( !ret && unlikely(rc) )
>>> +                        ret = rc;
>>> +                }
>>>          }
>>> +
>>> +        rc = 0;
>>>      }
>>
>> So the reason for doing this thing with setting ret=0, then using rc,
>> then setting rc to zero, is to make sure that the change is propagated
>> to the altp2m if necessary?
>>
>> I'm not a fan of this sort of "implied signalling".  The check at the
>> altp2m conditional is meant to say, "everything went fine, go ahead
>> and propagate the change".  But with your changes, that's not what we
>> want anymore -- we want, "If the change actually made it into the
>> hostp2m, propagate it to the altp2m."
>>
>> As such, I think it would be a lot clearer to just make a new boolean
>> variable, maybe "entry_written", which we initialize to false and then
>> set to true when the entry is actually written; and then change the
>> condition re the altp2m to "if ( entry_written && p2m_is_hostp2m(..)
>> )".
>>
>> Then I think you could make the ret / rc use mirror what you do
>> elsewhere, where ret is used to store the return value of the function
>> call, and it's propagated to rc only if rc is not already set.
>>
> 
> George,
> 
> Thanks for your detailed explanation. This seems an another matter of 
> preference.
> Of course, I'd follow your suggestion in p2m  field, while I need to hear the 
> other maintainers' options as well
> (especially when it comes from Kevin/Jan, as they do spend a lot of time for 
> me).
> 
> A little bit of difference here, IMO, an int 'entry_written' is much better, 
> as we also need  to propagate the 'entry_written' up to callers,
> i.e. the errors returned from atomic_write_ept_entry() are '-EINVAL', 
> '-ESRCH' and '-EBUSY'. Then, the follow is my modification (feel free to 
> correct me):

But it doesn't look like you're actually taking advantage of the fact
that entry_written is non-boolean -- other than immediately copying the
value into rc, all the places you're only checking if it's zero or not
(where "zero" in this case actually means "the entry was written").

The code you had before was, as far as I can tell, functionally correct.
 The point of introducing the extra variable is to decrease cognitive
load on people coming along to modify the code later.  To do that, you
want to maximize the things you do that are within expectation, and
minimize the things you do outside of that.

The name "entry_written" sounds like a boolean; developers will expect
"0 / false" to mean "not written" and "1/true" to mean "entry written".
 Furthermore, rc looks like a return value; if coders see "rc =
atomic_write_ept_entry()" they immediately have a framework for what's
going on; whereas if they see "entry_written == [...]; if
(entry_written) { ... rc= entry_written; }", it takes some thinking to
figure out.  Not much, but every little bit of unnecessary thinking adds up.

My suggestion remains to:
1. Declare entry_written as a bool, initialized to false
2. In both atomic_write_ept_entry() calls, use rc to collect the return
value
3. In the second one (the one which may fail), add an else {
entry_written = true; }
4. In the conditional deciding whether to update the iommu, use
"entry_written", I think.  At this point rc == 0 and entry_written =
true will be identical, but if we ever insert something in between, we
want the iommu update to be attempted any time the entry is successfully
written.
5. Use "if ( entry_written && ...)" when deciding whether to propagate
the change to the altp2m.

Thanks,
 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.