[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 07/18] IOMMU/x86: perform PV Dom0 mappings in batches


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 3 Dec 2021 13:38:48 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0VYRfu9EyLwvA7J3Z1EnTAe0vKAKo8A9fD7jOLH5kiw=; b=ko311QiCSV1NMs3YLRvImm0YJqB0x+Ms601n8IKm++p/nlxk3iiiMFRQa7Fgo4yOFA6JFMZ4wdm18Z+QQEadk3O1hOqbjVk4MEKSw0VNDgsH6U5BxuxYwRGnIb41YRWjfn8lIoMkneMie38x2uWsp1YebcGgpCqGU+SMy41IhKKtiJyooWBePZgeIecRc/P2zAcPYendOO9pDm91Lv9ch2D/IxVv6cbCcZmc/VgGxH5ONWPacpuemqR5MYtyo7nUm5SdFRroaF3x77/2JEDsTPV8GwdXfA9EVy9Mzz4WUH2NZbtaMwrkzZhag8HxOvbMtjvTSpIjorjjWu5gKtbJZA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UoC6gumCGwM9664o2gPVFVBXs1DlrVRdblhoM9m3JewZNaIDy5MeOylqUbcJ+7OJdRkxCfcFOq61X8LZw0nrZpwppCFGVcROHwhfsD6Q5w5aZrGWuZY0+hdhMx3o3AB9bia8gv6fs9cFZKVfG5BLIK9ArucSz+3J0dpk88gKLc8IfSVZVqHbxXo3SC/l4zeRmKv0TbEgfnD3MFEXFl2TB5hwVOigtOOMuyn4Syc2Tyk80yEd8ENjZUmCgLlVfeyHSEq9luqWw2nqQWrFFl4H9AYF0b74Fh9N0OdDZJivKNneYNz4j32dRbN7wmLR0Ce9nNcTFKlaIuEXvR+v/am8zA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Fri, 03 Dec 2021 12:39:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.12.2021 15:10, Roger Pau Monné wrote:
> On Fri, Sep 24, 2021 at 11:47:41AM +0200, Jan Beulich wrote:
>> @@ -689,7 +763,8 @@ int __init dom0_construct_pv(struct doma
>>          l1tab++;
>>  
>>          page = mfn_to_page(_mfn(mfn));
>> -        if ( !page->u.inuse.type_info &&
>> +        if ( (!page->u.inuse.type_info ||
>> +              page->u.inuse.type_info == (PGT_writable_page | 
>> PGT_validated)) &&
> 
> Would it be clearer to get page for all pages that have a 0 count:
> !(type_info & PGT_count_mask). Or would that interact badly with page
> table pages?

Indeed this wouldn't work with page tables (and I recall having learned
this the hard way): Prior to mark_pv_pt_pages_rdonly() they all have a
type refcount of zero. Even if it wasn't for this, I'd prefer to not
relax the condition here more than necessary.

>> @@ -720,6 +795,17 @@ int __init dom0_construct_pv(struct doma
>>      /* Pages that are part of page tables must be read only. */
>>      mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages);
>>  
>> +    /*
>> +     * This needs to come after all potentially excess
>> +     * get_page_and_type(..., PGT_writable_page) invocations; see the loop a
>> +     * few lines further up, where the effect of calling that function in an
>> +     * earlier loop iteration may get overwritten by a later one.
>> +     */
>> +    if ( need_iommu_pt_sync(d) &&
>> +         iommu_unmap(d, _dfn(PFN_DOWN(mpt_alloc) - nr_pt_pages), 
>> nr_pt_pages,
>> +                     &flush_flags) )
>> +        BUG();
> 
> Wouldn't such unmap better happen as part of changing the types of the
> pages that become part of the guest page tables?

Not sure - it's a single call here, but would be a call per page when
e.g. moved into mark_pv_pt_pages_rdonly().

>> @@ -840,22 +928,41 @@ int __init dom0_construct_pv(struct doma
>>  #endif
>>      while ( pfn < nr_pages )
>>      {
>> -        if ( (page = alloc_chunk(d, nr_pages - domain_tot_pages(d))) == 
>> NULL )
>> +        count = domain_tot_pages(d);
>> +        if ( (page = alloc_chunk(d, nr_pages - count)) == NULL )
>>              panic("Not enough RAM for DOM0 reservation\n");
>> +        mfn = mfn_x(page_to_mfn(page));
>> +
>> +        if ( need_iommu_pt_sync(d) )
>> +        {
>> +            rc = iommu_map(d, _dfn(mfn), _mfn(mfn), domain_tot_pages(d) - 
>> count,
>> +                           IOMMUF_readable | IOMMUF_writable, &flush_flags);
>> +            if ( rc )
>> +                printk(XENLOG_ERR
>> +                       "pre-mapping MFN %lx (PFN %lx) into IOMMU failed: 
>> %d\n",
>> +                       mfn, pfn, rc);
>> +        }
>> +
>>          while ( pfn < domain_tot_pages(d) )
>>          {
>> -            mfn = mfn_x(page_to_mfn(page));
>> +            if ( !rc )
>> +                make_pages_writable(page, 1);
> 
> There's quite a lot of repetition of the pattern: allocate, iommu_map,
> set as writable. Would it be possible to abstract this into some
> kind of helper?
> 
> I've realized some of the allocations use alloc_chunk while others use
> alloc_domheap_pages, so it might require some work.

Right, I'd leave the allocation part aside for the moment. I had actually
considered to fold iommu_map() and make_pages_writable() into a common
helper (or really rename make_pages_writable() and fold iommu_map() into
there). What I lacked was a reasonable, not overly long name for such a
function. Plus - maybe minor - I wanted to avoid extra MFN <-> page
translations.

With a fair name suggested, I'd be happy to give this a try.

>>  #ifndef NDEBUG
>>  #define pfn (nr_pages - 1 - (pfn - (alloc_epfn - alloc_spfn)))
>>  #endif
>>              dom0_update_physmap(compat, pfn, mfn, vphysmap_start);
>>  #undef pfn
>> -            page++; pfn++;
>> +            page++; mfn++; pfn++;
>>              if ( !(pfn & 0xfffff) )
>>                  process_pending_softirqs();
>>          }
>>      }
>>  
>> +    /* Use while() to avoid compiler warning. */
>> +    while ( iommu_iotlb_flush_all(d, flush_flags) )
>> +        break;
> 
> Might be worth to print a message here in case of error?

Maybe. But then not just here. See e.g. arch_iommu_hwdom_init().

>> @@ -372,16 +372,30 @@ void __hwdom_init arch_iommu_hwdom_init(
>>                                          perms & IOMMUF_writable ? 
>> p2m_access_rw
>>                                                                  : 
>> p2m_access_r,
>>                                          0);
>> +        else if ( pfn != start + count || perms != start_perms )
>> +        {
>> +        commit:
>> +            rc = iommu_map(d, _dfn(start), _mfn(start), count,
>> +                           start_perms, &flush_flags);
>> +            SWAP(start, pfn);
>> +            start_perms = perms;
>> +            count = 1;
>> +        }
>>          else
>> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K,
>> -                           perms, &flush_flags);
>> +        {
>> +            ++count;
>> +            rc = 0;
>> +        }
>>  
>>          if ( rc )
>>              printk(XENLOG_WARNING "%pd: identity %smapping of %lx failed: 
>> %d\n",
>>                     d, !paging_mode_translate(d) ? "IOMMU " : "", pfn, rc);
> 
> Would be nice to print the count (or end pfn) in case it's a range.

I can do so if you think it's worth further extra code. I can't use
"count" here in particular, as that was updated already (in context
above). The most reasonable change towards this would perhaps be to
duplicate the printk() into both the "if()" and the "else if()" bodies.

> While not something that you have to fix here, the logic here is
> becoming overly complicated IMO. It might be easier to just put all
> the ram and reserved regions (or everything < 4G) into a rangeset and
> then punch holes on it for non guest mappable regions, and finally use
> rangeset_consume_ranges to iterate and map those. That's likely faster
> than having to iterate over all pfns on the system, and easier to
> understand from a logic PoV.

Maybe; I didn't spend much time on figuring possible ways of
consolidating some of this.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.