[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 15/18] IOMMU/x86: prefill newly allocate page tables


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 14 Dec 2021 16:05:27 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vTLdYKmzRkDQfscq7wXpk9Qe9lHGXyErjVwWdJjcN/o=; b=JJsEdG4PPXScyqVF+fY62UtK3DWvlxcP85vQAJOXXO3efDCfeZ6v3btfPEzg2CzheEFNxMvbhbp/YFBlGpHQb6lNr0o++fyCwRUFtvv3XVNRiGyXgUQdKTuRFgLa539xkzEMsfPa7WYzoZGwH0U6UKcGT+QCLFpi8u0C0upm7JjoWaquperHfZ6B7yV6jso9DLl3XgC3Rz+VIgEuxzSY+dff7wUUDDdThIp++lkzLPwb+7+mpDg31mDh9E0/eRla1GvoxtFW8Auw0ky5QCL3Mn4A3Le5h8egH1XrcyrBOJKWK54ROHTC4NW8nX0GfWriqY7UuOAMadZtA339bTkt2w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=e0mg2JjXLXDD7HE53879p1t60/oMhZ595EzxAOIgu4dLKvWRzuH9o69RjllXDBtqMSAB/pAlGU9Zu2Q+9IiNJY0ReEHVn9T6aOO7sXOj6+3aVnFLOZpFZnkLIVSPFHFPnqOko3wEHPoZfyz1keCynMknW7MnVCIu07GsPkpZEdIfOvbzaFte0l9wjOKICEjZp3Epa0IQh0xHkbFOUn3dHKrFfAeDOHjRTGSSBvG53PsF/e0zyKwemTj7Sd3MMwMK+Cp+ob+nFNab1YF9Xh1HTihv9fsVYFNnryIDrnVpcIyfIijkpcmnDpKnNPlBOC8W79NA3OFu6LVIDd8hFeOnQA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Tue, 14 Dec 2021 15:05:44 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.12.2021 15:50, Roger Pau Monné wrote:
> On Fri, Sep 24, 2021 at 11:54:58AM +0200, Jan Beulich wrote:
>> Page table are used for two purposes after allocation: They either start
>> out all empty, or they get filled to replace a superpage. Subsequently,
>> to replace all empty or fully contiguous page tables, contiguous sub-
>> regions will be recorded within individual page tables. Install the
>> initial set of markers immediately after allocation. Make sure to retain
>> these markers when further populating a page table in preparation for it
>> to replace a superpage.
>>
>> The markers are simply 4-bit fields holding the order value of
>> contiguous entries. To demonstrate this, if a page table had just 16
>> entries, this would be the initial (fully contiguous) set of markers:
>>
>> index  0 1 2 3 4 5 6 7 8 9 A B C D E F
>> marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
>>
>> "Contiguous" here means not only present entries with successively
>> increasing MFNs, each one suitably aligned for its slot, but also a
>> respective number of all non-present entries.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> Obviously this marker only works for newly created page tables right
> now, the moment we start poking holes or replacing entries the marker
> is not updated anymore. I expect further patches will expand on
> this.

Well, until there's a consumer of the markers, there's no need to
keep them updated. That updating is indeed going to be added in
subsequent patches. I've merely tried to split off pieces that can
go one their own.

>> --- a/xen/drivers/passthrough/amd/iommu-defs.h
>> +++ b/xen/drivers/passthrough/amd/iommu-defs.h
>> @@ -445,6 +445,8 @@ union amd_iommu_x2apic_control {
>>  #define IOMMU_PAGE_TABLE_U32_PER_ENTRY      (IOMMU_PAGE_TABLE_ENTRY_SIZE / 
>> 4)
>>  #define IOMMU_PAGE_TABLE_ALIGNMENT  4096
>>  
>> +#define IOMMU_PTE_CONTIG_MASK           0x1e /* The ign0 field below. */
> 
> Should you rename ign0 to contig_mask or some such now?

Not sure. I guess I should attach a comment linking here.

> Same would apply to the comment next to dma_pte for VT-d, where bits
> 52:62 are ignored (the comments seems to be missing this already) and
> we will be using bits 52:55 to store the contiguous mask for the
> entry.

Same here then.

>> --- a/xen/drivers/passthrough/amd/iommu_map.c
>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
>> @@ -116,7 +116,19 @@ static void set_iommu_ptes_present(unsig
>>  
>>      while ( nr_ptes-- )
>>      {
>> -        set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
>> +        ASSERT(!pde->next_level);
>> +        ASSERT(!pde->u);
>> +
>> +        if ( pde > table )
>> +            ASSERT(pde->ign0 == find_first_set_bit(pde - table));
>> +        else
>> +            ASSERT(pde->ign0 == PAGE_SHIFT - 3);
> 
> You could even special case (pde - table) % 2 != 0, but this is debug
> only code, and it's possible a mod is more costly than
> find_first_set_bit.

Not sure why I would want to special case anything that doesn't need
special casing. The pde == table case needs special care because there
find_first_set_bit() cannot be called.

>> @@ -450,7 +450,28 @@ struct page_info *iommu_alloc_pgtable(st
>>          return NULL;
>>  
>>      p = __map_domain_page(pg);
>> -    clear_page(p);
>> +
>> +    if ( contig_mask )
>> +    {
>> +        unsigned int i, shift = find_first_set_bit(contig_mask);
>> +
>> +        ASSERT(((PAGE_SHIFT - 3) & (contig_mask >> shift)) == PAGE_SHIFT - 
>> 3);
>> +
>> +        p[0] = (PAGE_SHIFT - 3ull) << shift;
>> +        p[1] = 0;
>> +        p[2] = 1ull << shift;
>> +        p[3] = 0;
>> +
>> +        for ( i = 4; i < PAGE_SIZE / 8; i += 4 )
>> +        {
>> +            p[i + 0] = (find_first_set_bit(i) + 0ull) << shift;
>> +            p[i + 1] = 0;
>> +            p[i + 2] = 1ull << shift;
>> +            p[i + 3] = 0;
>> +        }
> 
> You could likely do:
> 
> for ( i = 0; i < PAGE_SIZE / 8; i += 4 )
> {
>     p[i + 0] = i ? ((find_first_set_bit(i) + 0ull) << shift)
>                  : ((PAGE_SHIFT - 3ull) << shift);
>     p[i + 1] = 0;
>     p[i + 2] = 1ull << shift;
>     p[i + 3] = 0;
> }
> 
> To avoid having to open code the first loop iteration.

I could, but I explicitly didn't want to. I consider conditionals
inside a loop which special case just the first (or last) iteration
quite odd (unless they really save a lot of duplication).

> The ternary
> operator could also be nested before the shift, but I find that
> harder to read.

If I was to make the change, then that alternative way, as it would
allow to avoid the addition of 0ull:

    p[i + 0] = (i ? find_first_set_bit(i)
                  : PAGE_SHIFT - 3ull) << shift;

I could be talked into going that route (but not the intermediate
one you've suggested).

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.