[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH v2 6/8] arm/mem_access: Add long-descriptor based gpt



Hi Sergej,

On 06/01/2017 04:18 PM, Sergej Proskurin wrote:
This commit adds functionality to walk the guest's page tables using the
long-descriptor translation table format for both ARMv7 and ARMv8.
Similar to the hardware architecture, the implementation supports
different page granularities (4K, 16K, and 64K). The implementation is
based on ARM DDI 0487A-g J11-5608, J11-5679, and ARM DDI 0406C-b
B3-1510.

Please use the most recent ARM ARM.


Note that the current implementation lacks support for 32-bit EL0
running on top of 64-bit EL1. The associated location in the code is
marked appropriately.

Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx>
---
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Julien Grall <julien.grall@xxxxxxx>
---
v2: Use TCR_SZ_MASK instead of TTBCR_SZ_MASK for ARM 32-bit guests using
     the long-descriptor translation table format.

     Cosmetic fixes.
---
  xen/arch/arm/p2m.c | 271 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 269 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 0337d83581..ea3be6f050 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1573,8 +1573,275 @@ static int __p2m_walk_gpt_ld(struct p2m_domain *p2m,
                               vaddr_t gva, paddr_t *ipa,
                               unsigned int *perm_ro)
  {
-    /* Not implemented yet. */
-    return -EFAULT;
+    int t0_sz, t1_sz, disabled = 1;

Looking at the way you use t0_sz and t1_sz, they will never be < 0 so I thin should be unsigned.

Also, I think disabled is always 0 or 1. If that is true, please use bool/true/false.

+    unsigned int level, gran;
+    unsigned int topbit = 0, input_size = 0, output_size;
+    uint64_t ttbr = 0, ips;
+    paddr_t mask;
+    lpae_t pte, *table;
+    struct page_info *page;
+    register_t tcr = READ_SYSREG(TCR_EL1);
+    struct domain *d = p2m->domain;
+
+    const vaddr_t offsets[4][3] = {
+        {
+#ifdef CONFIG_ARM_64
+            zeroeth_guest_table_offset_4k(gva),
+            zeroeth_guest_table_offset_16k(gva),
+            0, /* There is no zeroeth lookup level with a 64K granule size. */
+#endif
+        },
+        {
+            first_guest_table_offset_4k(gva),
+#ifdef CONFIG_ARM_64
+            first_guest_table_offset_16k(gva),
+            first_guest_table_offset_64k(gva),
+#endif
+        },
+        {
+            second_guest_table_offset_4k(gva),
+#ifdef CONFIG_ARM_64
+            second_guest_table_offset_16k(gva),
+            second_guest_table_offset_64k(gva),
+#endif
+        },
+        {
+            third_guest_table_offset_4k(gva),
+#ifdef CONFIG_ARM_64
+            third_guest_table_offset_16k(gva),
+            third_guest_table_offset_64k(gva),
+#endif
+        }
+    };
+
+    const paddr_t masks[4][3] = {
+        {
+            ZEROETH_SIZE_4K - 1,
+            ZEROETH_SIZE_16K - 1,
+            0 /* There is no zeroeth lookup level with a 64K granule size. */
+        },
+        {
+            FIRST_SIZE_4K - 1,
+            FIRST_SIZE_16K - 1,
+            FIRST_SIZE_64K - 1
+        },
+        {
+            SECOND_SIZE_4K - 1,
+            SECOND_SIZE_16K - 1,
+            SECOND_SIZE_64K - 1
+        },
+        {
+            THIRD_SIZE_4K - 1,
+            THIRD_SIZE_16K - 1,
+            THIRD_SIZE_64K - 1
+        }
+    };

Please define them as static. It is not necessary to have them on the stack everytime.

+
+    const unsigned int grainsizes[3] = {
+        PAGE_SHIFT_4K,
+        PAGE_SHIFT_16K,
+        PAGE_SHIFT_64K > +    };

Ditto.

+
+    const unsigned int strides[3] = {
+        LPAE_SHIFT_4K,
+        LPAE_SHIFT_16K,
+        LPAE_SHIFT_64K
+    };

Ditto. Also, the stride can be found from the page shift. So I am not convinced you need that.

+
+    t0_sz = (tcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
+    t1_sz = (tcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
+
+    /*
+     * Get the MSB number of the gva, according to "AddrTop" pseudocode

That's a call to have a separate helper for AddrTop rather than open-coding in this function.

I think this function could be split in multiple part, making easier the review. For instance, you have a lot of "if is_*bit_domain".

+     * implementation in ARM DDI 0487A-g J11-5739.
+     *
+     * XXX: We do not check whether the 64bit domain uses a 32-bit EL0. In this
+     * case, we need to set topbit to 31, as well.
I think checking 32-bit EL0 is straigh-forward enough to get it done now. Have a look at psr_most_is_32bit.

+     */
+    if ( is_32bit_domain(d) )
+        topbit = TCR_TB_31;
+#ifdef CONFIG_ARM_64
+    else if ( is_64bit_domain(d) )
+    {
+        if ( ((gva & (1UL << TCR_TB_55)) && (tcr & TCR_TBI1)) ||

Please use BIT(...) instead of (1UL << ...).

+             (!(gva & (1UL << TCR_TB_55)) && (tcr & TCR_TBI0)) )

Ditto.

+            topbit = TCR_TB_55;

This is really confusing. You define TCR_TB_* to * but it is not even part of the register TCR_. TBH, I don't think they hence the code and would just hardcoded the 55 here. Afterall it is in the name :).

+        else
+            topbit = TCR_TB_63

Ditto.

+    }
+#endif
+
+#ifdef CONFIG_ARM_64
+    if ( is_64bit_domain(d) )
+    {

Likely a comment is missing here to explain what you are doing below. My understand is you are selecting TTBR*_EL1. And looking at the code, this could be abstracted as both branches are nearly the same.

+        if ( (gva & (1UL << topbit)) == 0 )

BIT(...)

+        {
+            input_size = REGISTER_WIDTH_64_BIT - t0_sz;
+
+            if ( input_size > IPS_MAX )
+                /* We limit the input_size to be max 48 bit. */
+                input_size = IPS_MAX;
+            else if ( input_size < IPS_MIN )
+                /* We limit the input_size to be max 25 bit. */
+                input_size = IPS_MIN;

like this could be simplified by using min/max. But I think we should bail out here. Likely something in the page table is wrong and ignoring is the worst thing to do.

For instance ARMv8.2 has extended the input size to 52 bits. It would be difficult to catch what is missing because of that, not mentioning that the only caller today will be memaccess that is not enabled by default.

+
+            /* Normalize granule size. */

I think 0, 1, 2 is more confusing to read. It would be better to use directly TCR_TG0_*.

+            switch ( tcr & TCR_TG0_MASK ) {

Coding style, the brace should be on a newline.

+            case TCR_TG0_16K:
+                gran = 1;
+                break;
+            case TCR_TG0_64K:
+                gran = 2;
+                break;
+            default:
+                gran = 0;
+            } > +
+            /* Use TTBR0 for GVA to IPA translation. */
+            ttbr = READ_SYSREG64(TTBR0_EL1);
+
+            /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
+            disabled = ( tcr & TCR_EPD0 ) ? 1 : 0;

disabled = !!(tcr & TCR_EPD0);

or if you are using bool as requested above:

disabled =  tcr & TCR_EPD0;

+        }
+        else
+        {
+            input_size = REGISTER_WIDTH_64_BIT - t1_sz;
+
+            if ( input_size > IPS_MAX )
+                /* We limit the input_size to be max 48 bit. */
+                input_size = IPS_MAX;
+            else if ( input_size < IPS_MIN )
+                /* We limit the input_size to be max 25 bit. */
+                input_size = IPS_MIN;
+
+            /* Normalize granule size. */
+            switch ( tcr & TCR_TG1_MASK ) {

Coding style.

+            case TCR_TG1_16K:
If you shift your tcr by TCR_TG1_SHIFT then all this code can become generic. Avoiding duplication, reviewing twice similar code and potential bug.

+                gran = 1;
+                break;
+            case TCR_TG1_64K:
+                gran = 2;
+                break;
+            default:
+                gran = 0;
+            }
+
+            /* Use TTBR1 for GVA to IPA translation. */
+            ttbr = READ_SYSREG64(TTBR1_EL1);
+
+            /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
+            disabled = ( tcr & TCR_EPD1 ) ? 1 : 0;

Same as above.

+        }
+    }
+    else
+#endif
+    {
+        /* Granule size of Aarch32 or ARMv7 architectures is always 4K, 
indexed by 0. */

NIT: It is not necessarily to mention ARMv7. ARMv7 is always AArch32.

Also s/Aarch32/Aarch32/

+        gran = 0;
+
+        mask = ((1ULL << REGISTER_WIDTH_32_BIT) - 1) &
+               ~((1ULL << (REGISTER_WIDTH_32_BIT - t0_sz)) - 1);

I don't understand this mask. Why do you need it?

+
+        if ( t0_sz == 0 || !(gva & mask) )
+        {
+            input_size = REGISTER_WIDTH_32_BIT - t0_sz;
+
+            /* Use TTBR0 for GVA to IPA translation. */
+            ttbr = READ_SYSREG64(TTBR0_EL1);
+
+            /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
+            disabled = ( tcr & TCR_EPD0 ) ? 1 : 0;

As for the AArch64, there is a call to abstract here.

+        }
+
+        mask = ((1ULL << REGISTER_WIDTH_32_BIT) - 1) &
+               ~((1ULL << (REGISTER_WIDTH_32_BIT - t1_sz)) - 1);
+
+        if ( ((t1_sz == 0) && !ttbr) || (t1_sz && (gva & mask) == mask) )
+        {
+            input_size = REGISTER_WIDTH_32_BIT - t1_sz;
+
+            /* Use TTBR1 for GVA to IPA translation. */
+            ttbr = READ_SYSREG64(TTBR1_EL1);
+
+            /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
+            disabled = ( tcr & TCR_EPD1 ) ? 1 : 0;

Ditto.

+        }
+    }
+
+    if ( disabled )
+        return -EFAULT;
+
+    level = 4 - DIV_ROUND_UP((input_size - grainsizes[gran]), strides[gran]);

It took me a bit to understand this. The comment in the ARM ARM pseudo-code would be useful to keep it here.

+
+    /* XXX: We do not consider 32bit EL0 running on Aarch64, yet. */

See my comment above about 32bit EL0 support.

+    if ( is_64bit_domain(d) )
+    {
+        /* Get the intermediate physical address size. */
+        ips = (tcr & TCR_IPS_MASK) >> TCR_IPS_SHIFT;
+
+        switch (ips) {
+        case TCR_IPS_32_BIT:
+            output_size = IPS_32_BIT;
+            break;
+        case TCR_IPS_36_BIT:
+            output_size = IPS_36_BIT;
+            break;
+        case TCR_IPS_40_BIT:
+            output_size = IPS_40_BIT;
+            break;
+        case TCR_IPS_42_BIT:
+            output_size = IPS_42_BIT;
+            break;
+        case TCR_IPS_44_BIT:
+            output_size = IPS_44_BIT;
+            break;
+        case TCR_IPS_48_BIT:
+        default:
+            output_size = IPS_48_BIT;
+        }
+    }
+    else
+        output_size = IPS_40_BIT;
+
+    /* Make sure the base address does not exceed its configured size. */
+    mask = ((1ULL << IPS_48_BIT) - 1) & ~((1ULL << output_size) - 1);
+    if ( output_size != IPS_48_BIT && (ttbr & mask) )
+        return -EFAULT;
+
+    mask = ((1ULL << output_size) - 1);
+    page = get_page_from_gfn(d, paddr_to_pfn(ttbr & mask), NULL, P2M_ALLOC);
+    if ( !page )
+        return -EFAULT;
+
+    table = __map_domain_page(page);
+
+    for ( ; ; level++ )
+    {
+        pte = table[offsets[level][gran]];
+
+        unmap_domain_page(table);
+        put_page(page);
+
+        if ( level == 3 || !pte.walk.valid || !pte.walk.table )

Please comment this line.

+            break;
+
+        page = get_page_from_gfn(d, pte.walk.base, NULL, P2M_ALLOC);

It is a bit confusing. You care about the output_size for the when getting the TBBR but not the rest of the tables. Is it normal?

+        if ( !page )
+            return -EFAULT;
+
+        table = __map_domain_page(page);
+    }
+
+    if ( !pte.walk.valid || ((level == 3) & !pte.walk.table) )

There is a call to leverage the p2m_table/p2m_valid helpers here.

+        return -EFAULT;
+
+    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level][gran]);
+
+    *perm_ro = pte.pt.ro;
+
+    /* Return the entire pte so that the caller can check flags by herself. */

Hmmm? You don't return the entire pte but the only whether the page is writable or readable-only.

In general we want to have more information back such as execute-never bit...

+    return 0;
  }
int p2m_walk_gpt(struct p2m_domain *p2m, vaddr_t gva,


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.