[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 07/18] IOMMU/x86: perform PV Dom0 mappings in batches


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 10 Dec 2021 12:41:31 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x89M2zig9DXdn3QqBbLEBv2HMC5p/11NIDL8RS+jL+o=; b=dR+ZwBIshvlUV0AHuRP9Bd2oTr+fjMkjScuj0tD6JA5hAKJ/PulLDM4eEDT2KMEK/fsh+yPIiJLVcuxMyOwUuZ2v+d+Z818WQCz8wW1HArml/K+XN6wz/7drkAtV2IQJRAQUA+ZWpE/NT9wLRu0GIHYtHv+QEquoxZn3/QMYhFHVGLPdnTLfCESz3WFU1yf9KfKwD3Ei5sEhZUBB4scVbOPlRr7BA782wZR+50inSNssFSpeZMNWnXQ3cQTkKOG2z6Hs5ZPoYdH1bwPqhgg9vlsxf6QGf4xWkkuaRqTUkqvjA7plfVTQ4JU3B9ZOLu3hLfMM9ngcZ1i8UHxPKr6wmQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ExPz8S9EJCXUR4R2uX6YNq33iuz30po/IqlDQyg99godVvbTUFDwjYzciotTmLvAVsfDr7NzVT777dNurSnelasmz5YauOQ7oMxfnVv/GC1/52zh4NPjsu0fHJ+nA2RihTQRhQ4ZhdG0UyxaKkhz+q5VKiBhH6mixesREXCMq4ehWsLwkTXVteTlhbhxfUfxI96aDsmQu97DL4zbC5/t0LnlonXDiDjn28tu4uC/BDCycgSa+KKGQxDfug/lkcAigjcCtx7kIU+Qv8tAv0YdlnClamec4FZp8WXPyZsSNn0wSGHOIFx+RnixqIUG6T6tS3UyKUpQhR/h6t/wIFKx0w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Fri, 10 Dec 2021 11:41:46 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.12.2021 10:36, Roger Pau Monné wrote:
> On Fri, Dec 03, 2021 at 01:38:48PM +0100, Jan Beulich wrote:
>> On 02.12.2021 15:10, Roger Pau Monné wrote:
>>> On Fri, Sep 24, 2021 at 11:47:41AM +0200, Jan Beulich wrote:
>>>> @@ -720,6 +795,17 @@ int __init dom0_construct_pv(struct doma
>>>>      /* Pages that are part of page tables must be read only. */
>>>>      mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages);
>>>>  
>>>> +    /*
>>>> +     * This needs to come after all potentially excess
>>>> +     * get_page_and_type(..., PGT_writable_page) invocations; see the 
>>>> loop a
>>>> +     * few lines further up, where the effect of calling that function in 
>>>> an
>>>> +     * earlier loop iteration may get overwritten by a later one.
>>>> +     */
>>>> +    if ( need_iommu_pt_sync(d) &&
>>>> +         iommu_unmap(d, _dfn(PFN_DOWN(mpt_alloc) - nr_pt_pages), 
>>>> nr_pt_pages,
>>>> +                     &flush_flags) )
>>>> +        BUG();
>>>
>>> Wouldn't such unmap better happen as part of changing the types of the
>>> pages that become part of the guest page tables?
>>
>> Not sure - it's a single call here, but would be a call per page when
>> e.g. moved into mark_pv_pt_pages_rdonly().
> 
> I see. So this would result in multiple calls when moved, plus all the
> involved page shattering and aggregation logic. Overall it would be
> less error prone, as the iommu unmap would happen next to the type
> change, but I'm not going to insist if you think it's not worth it.
> The page table structure pages shouldn't be that many anyway?

Typically it wouldn't be that many, true. I'm not sure about "less
error prone", though: We'd have more problems if the range unmapped
here wasn't properly representing the set of page tables used.

>>>> @@ -840,22 +928,41 @@ int __init dom0_construct_pv(struct doma
>>>>  #endif
>>>>      while ( pfn < nr_pages )
>>>>      {
>>>> -        if ( (page = alloc_chunk(d, nr_pages - domain_tot_pages(d))) == 
>>>> NULL )
>>>> +        count = domain_tot_pages(d);
>>>> +        if ( (page = alloc_chunk(d, nr_pages - count)) == NULL )
>>>>              panic("Not enough RAM for DOM0 reservation\n");
>>>> +        mfn = mfn_x(page_to_mfn(page));
>>>> +
>>>> +        if ( need_iommu_pt_sync(d) )
>>>> +        {
>>>> +            rc = iommu_map(d, _dfn(mfn), _mfn(mfn), domain_tot_pages(d) - 
>>>> count,
>>>> +                           IOMMUF_readable | IOMMUF_writable, 
>>>> &flush_flags);
>>>> +            if ( rc )
>>>> +                printk(XENLOG_ERR
>>>> +                       "pre-mapping MFN %lx (PFN %lx) into IOMMU failed: 
>>>> %d\n",
>>>> +                       mfn, pfn, rc);
>>>> +        }
>>>> +
>>>>          while ( pfn < domain_tot_pages(d) )
>>>>          {
>>>> -            mfn = mfn_x(page_to_mfn(page));
>>>> +            if ( !rc )
>>>> +                make_pages_writable(page, 1);
>>>
>>> There's quite a lot of repetition of the pattern: allocate, iommu_map,
>>> set as writable. Would it be possible to abstract this into some
>>> kind of helper?
>>>
>>> I've realized some of the allocations use alloc_chunk while others use
>>> alloc_domheap_pages, so it might require some work.
>>
>> Right, I'd leave the allocation part aside for the moment. I had actually
>> considered to fold iommu_map() and make_pages_writable() into a common
>> helper (or really rename make_pages_writable() and fold iommu_map() into
>> there). What I lacked was a reasonable, not overly long name for such a
>> function.
> 
> I'm not overly good at naming, but I think we need to somehow find a
> way to place those together into a single helper.
> 
> I would be fine with naming this iommu_memory_{setup,add} or some
> such. Marking the pages as writable is a result (or a requirement
> might be a better way to express it?) of adding them to the IOMMU.
> Would you be OK with one of those names?

I'll use the suggestion as a basis and see how it ends up looking /
feeling.

>>>> @@ -372,16 +372,30 @@ void __hwdom_init arch_iommu_hwdom_init(
>>>>                                          perms & IOMMUF_writable ? 
>>>> p2m_access_rw
>>>>                                                                  : 
>>>> p2m_access_r,
>>>>                                          0);
>>>> +        else if ( pfn != start + count || perms != start_perms )
>>>> +        {
>>>> +        commit:
>>>> +            rc = iommu_map(d, _dfn(start), _mfn(start), count,
>>>> +                           start_perms, &flush_flags);
>>>> +            SWAP(start, pfn);
>>>> +            start_perms = perms;
>>>> +            count = 1;
>>>> +        }
>>>>          else
>>>> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K,
>>>> -                           perms, &flush_flags);
>>>> +        {
>>>> +            ++count;
>>>> +            rc = 0;
>>>> +        }
>>>>  
>>>>          if ( rc )
>>>>              printk(XENLOG_WARNING "%pd: identity %smapping of %lx failed: 
>>>> %d\n",
>>>>                     d, !paging_mode_translate(d) ? "IOMMU " : "", pfn, rc);
>>>
>>> Would be nice to print the count (or end pfn) in case it's a range.
>>
>> I can do so if you think it's worth further extra code. I can't use
>> "count" here in particular, as that was updated already (in context
>> above). The most reasonable change towards this would perhaps be to
>> duplicate the printk() into both the "if()" and the "else if()" bodies.
> 
> Maybe. The current message gives the impression that a single pfn has
> been added and failed, but without printing the range that failed the
> message will not be that helpful in diagnosing further issues that
> might arise due to the mapping failure.

I guess I'll make the change then. I'm still not really convinced though,
as the presence of the message should be far more concerning than whether
it's a single page or a range. As middle ground, would

             printk(XENLOG_WARNING "%pd: identity %smapping of %lx... failed: 
%d\n",

be indicative enough of this perhaps not having been just a single page?
Otoh splitting (and moving) the message would allow to drop the separate
paging_mode_translate() check.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.