Xen project Mailing List

Re: [PATCH v2 07/18] IOMMU/x86: perform PV Dom0 mappings in batches

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Fri, 3 Dec 2021 13:38:48 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0VYRfu9EyLwvA7J3Z1EnTAe0vKAKo8A9fD7jOLH5kiw=; b=ko311QiCSV1NMs3YLRvImm0YJqB0x+Ms601n8IKm++p/nlxk3iiiMFRQa7Fgo4yOFA6JFMZ4wdm18Z+QQEadk3O1hOqbjVk4MEKSw0VNDgsH6U5BxuxYwRGnIb41YRWjfn8lIoMkneMie38x2uWsp1YebcGgpCqGU+SMy41IhKKtiJyooWBePZgeIecRc/P2zAcPYendOO9pDm91Lv9ch2D/IxVv6cbCcZmc/VgGxH5ONWPacpuemqR5MYtyo7nUm5SdFRroaF3x77/2JEDsTPV8GwdXfA9EVy9Mzz4WUH2NZbtaMwrkzZhag8HxOvbMtjvTSpIjorjjWu5gKtbJZA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UoC6gumCGwM9664o2gPVFVBXs1DlrVRdblhoM9m3JewZNaIDy5MeOylqUbcJ+7OJdRkxCfcFOq61X8LZw0nrZpwppCFGVcROHwhfsD6Q5w5aZrGWuZY0+hdhMx3o3AB9bia8gv6fs9cFZKVfG5BLIK9ArucSz+3J0dpk88gKLc8IfSVZVqHbxXo3SC/l4zeRmKv0TbEgfnD3MFEXFl2TB5hwVOigtOOMuyn4Syc2Tyk80yEd8ENjZUmCgLlVfeyHSEq9luqWw2nqQWrFFl4H9AYF0b74Fh9N0OdDZJivKNneYNz4j32dRbN7wmLR0Ce9nNcTFKlaIuEXvR+v/am8zA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Fri, 03 Dec 2021 12:39:15 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.12.2021 15:10, Roger Pau Monné wrote: > On Fri, Sep 24, 2021 at 11:47:41AM +0200, Jan Beulich wrote: >> @@ -689,7 +763,8 @@ int __init dom0_construct_pv(struct doma >> l1tab++; >> >> page = mfn_to_page(_mfn(mfn)); >> - if ( !page->u.inuse.type_info && >> + if ( (!page->u.inuse.type_info || >> + page->u.inuse.type_info == (PGT_writable_page | >> PGT_validated)) && > > Would it be clearer to get page for all pages that have a 0 count: > !(type_info & PGT_count_mask). Or would that interact badly with page > table pages? Indeed this wouldn't work with page tables (and I recall having learned this the hard way): Prior to mark_pv_pt_pages_rdonly() they all have a type refcount of zero. Even if it wasn't for this, I'd prefer to not relax the condition here more than necessary. >> @@ -720,6 +795,17 @@ int __init dom0_construct_pv(struct doma >> /* Pages that are part of page tables must be read only. */ >> mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages); >> >> + /* >> + * This needs to come after all potentially excess >> + * get_page_and_type(..., PGT_writable_page) invocations; see the loop a >> + * few lines further up, where the effect of calling that function in an >> + * earlier loop iteration may get overwritten by a later one. >> + */ >> + if ( need_iommu_pt_sync(d) && >> + iommu_unmap(d, _dfn(PFN_DOWN(mpt_alloc) - nr_pt_pages), >> nr_pt_pages, >> + &flush_flags) ) >> + BUG(); > > Wouldn't such unmap better happen as part of changing the types of the > pages that become part of the guest page tables? Not sure - it's a single call here, but would be a call per page when e.g. moved into mark_pv_pt_pages_rdonly(). >> @@ -840,22 +928,41 @@ int __init dom0_construct_pv(struct doma >> #endif >> while ( pfn < nr_pages ) >> { >> - if ( (page = alloc_chunk(d, nr_pages - domain_tot_pages(d))) == >> NULL ) >> + count = domain_tot_pages(d); >> + if ( (page = alloc_chunk(d, nr_pages - count)) == NULL ) >> panic("Not enough RAM for DOM0 reservation\n"); >> + mfn = mfn_x(page_to_mfn(page)); >> + >> + if ( need_iommu_pt_sync(d) ) >> + { >> + rc = iommu_map(d, _dfn(mfn), _mfn(mfn), domain_tot_pages(d) - >> count, >> + IOMMUF_readable | IOMMUF_writable, &flush_flags); >> + if ( rc ) >> + printk(XENLOG_ERR >> + "pre-mapping MFN %lx (PFN %lx) into IOMMU failed: >> %d\n", >> + mfn, pfn, rc); >> + } >> + >> while ( pfn < domain_tot_pages(d) ) >> { >> - mfn = mfn_x(page_to_mfn(page)); >> + if ( !rc ) >> + make_pages_writable(page, 1); > > There's quite a lot of repetition of the pattern: allocate, iommu_map, > set as writable. Would it be possible to abstract this into some > kind of helper? > > I've realized some of the allocations use alloc_chunk while others use > alloc_domheap_pages, so it might require some work. Right, I'd leave the allocation part aside for the moment. I had actually considered to fold iommu_map() and make_pages_writable() into a common helper (or really rename make_pages_writable() and fold iommu_map() into there). What I lacked was a reasonable, not overly long name for such a function. Plus - maybe minor - I wanted to avoid extra MFN <-> page translations. With a fair name suggested, I'd be happy to give this a try. >> #ifndef NDEBUG >> #define pfn (nr_pages - 1 - (pfn - (alloc_epfn - alloc_spfn))) >> #endif >> dom0_update_physmap(compat, pfn, mfn, vphysmap_start); >> #undef pfn >> - page++; pfn++; >> + page++; mfn++; pfn++; >> if ( !(pfn & 0xfffff) ) >> process_pending_softirqs(); >> } >> } >> >> + /* Use while() to avoid compiler warning. */ >> + while ( iommu_iotlb_flush_all(d, flush_flags) ) >> + break; > > Might be worth to print a message here in case of error? Maybe. But then not just here. See e.g. arch_iommu_hwdom_init(). >> @@ -372,16 +372,30 @@ void __hwdom_init arch_iommu_hwdom_init( >> perms & IOMMUF_writable ? >> p2m_access_rw >> : >> p2m_access_r, >> 0); >> + else if ( pfn != start + count || perms != start_perms ) >> + { >> + commit: >> + rc = iommu_map(d, _dfn(start), _mfn(start), count, >> + start_perms, &flush_flags); >> + SWAP(start, pfn); >> + start_perms = perms; >> + count = 1; >> + } >> else >> - rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K, >> - perms, &flush_flags); >> + { >> + ++count; >> + rc = 0; >> + } >> >> if ( rc ) >> printk(XENLOG_WARNING "%pd: identity %smapping of %lx failed: >> %d\n", >> d, !paging_mode_translate(d) ? "IOMMU " : "", pfn, rc); > > Would be nice to print the count (or end pfn) in case it's a range. I can do so if you think it's worth further extra code. I can't use "count" here in particular, as that was updated already (in context above). The most reasonable change towards this would perhaps be to duplicate the printk() into both the "if()" and the "else if()" bodies. > While not something that you have to fix here, the logic here is > becoming overly complicated IMO. It might be easier to just put all > the ram and reserved regions (or everything < 4G) into a rangeset and > then punch holes on it for non guest mappable regions, and finally use > rangeset_consume_ranges to iterate and map those. That's likely faster > than having to iterate over all pfns on the system, and easier to > understand from a logic PoV. Maybe; I didn't spend much time on figuring possible ways of consolidating some of this. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.