[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH v3 08/10] arm/mem_access: Add long-descriptor based gpt



Hi Sergej,

On 15/06/17 12:05, Sergej Proskurin wrote:
This commit adds functionality to walk the guest's page tables using the
long-descriptor translation table format for both ARMv7 and ARMv8.
Similar to the hardware architecture, the implementation supports
different page granularities (4K, 16K, and 64K). The implementation is
based on ARM DDI 0487B-a J1-5922, J1-5999, and ARM DDI 0406C-b B3-1510.

NIT: The spec is 0487B.a and not 0487B-a.


Note that the current implementation lacks support for Large VA/PA on
ARMv8.2 architectures (LVA/LPA, 52-bit virtual and physical address
sizes). The associated location in the code is marked appropriately.

I am a bit confused, you are stating the 52-bit is not supported, but you seem to implement it below.

I don't want to see a feature half supported as this means rotten code. Given that 52-bit support in Xen is not implemented (and would require some work), 52-bit guest cannot be supported. So I would prefer that to see no code at all. Though I would keep the comment in the commit message.


Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx>
---
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Julien Grall <julien.grall@xxxxxxx>
---
v2: Use TCR_SZ_MASK instead of TTBCR_SZ_MASK for ARM 32-bit guests using
    the long-descriptor translation table format.

    Cosmetic fixes.

v3: Move the implementation to ./xen/arch/arm/guest_copy.c.

    Remove the array strides and declare the array grainsizes as static
    const instead of just const to reduce the function stack overhead.

    Move parts of the funtion guest_walk_ld into the static functions
    get_ttbr_and_gran_64bit and get_top_bit to reduce complexity.

    Use the macro BIT(x) instead of (1UL << x).

    Add more comments && Cosmetic fixes.
---
 xen/arch/arm/guest_walk.c        | 397 ++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-arm/guest_walk.h |   9 +
 2 files changed, 404 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
index f2f3533665..90bcc218ec 100644
--- a/xen/arch/arm/guest_walk.c
+++ b/xen/arch/arm/guest_walk.c
@@ -16,6 +16,8 @@
  */

 #include <xen/sched.h>
+#include <xen/domain_page.h>
+#include <asm/guest_walk.h>

 /*
  * The function guest_walk_sd translates a given GVA into an IPA using the
@@ -32,6 +34,92 @@ static int guest_walk_sd(struct domain *d,
     return -EFAULT;
 }

+#ifdef CONFIG_ARM_64

It is fairly confusing. Sometimes you #ifdef the AArch64 code, sometimes not (see below the if is_64bit_domain below). Can you stay consistent across the file please.

IHMO, we should really limit the #ifdef CONFIG_ARM_64. This makes more difficult to follow the code and has limited benefits if all definitions also exist for AArch32 too.

+/*
+ * Select the TTBR(0|1)_EL1 that will be used for address translation using the
+ * long-descriptor translation table format and return the page granularity
+ * that is used by the selected TTBR.
+ */
+static bool get_ttbr_and_gran_64bit(uint64_t *ttbr, unsigned int *gran,
+                                    register_t tcr, bool ttbrx)
+{
+    bool disabled;
+
+    if ( ttbrx == TTBR0_VALID )
+    {
+        /* Normalize granule size. */
+        switch ( tcr & TCR_TG0_MASK )
+        {
+        case TCR_TG0_16K:
+            *gran = GRANULE_SIZE_INDEX_16K;
+            break;
+        case TCR_TG0_64K:
+            *gran = GRANULE_SIZE_INDEX_64K;
+            break;
+        default:
+            *gran = GRANULE_SIZE_INDEX_4K;
+        }
+
+        /* Use TTBR0 for GVA to IPA translation. */
+        *ttbr = READ_SYSREG64(TTBR0_EL1);
+
+        /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
+        disabled = tcr & TCR_EPD0;
+    }
+    else
+    {
+        /* Normalize granule size. */
+        switch ( tcr & TCR_EL1_TG1_MASK )
+        {
+        case TCR_EL1_TG1_16K:
+            *gran = GRANULE_SIZE_INDEX_16K;
+            break;
+        case TCR_EL1_TG1_64K:
+            *gran = GRANULE_SIZE_INDEX_64K;
+            break;
+        default:
+            *gran = GRANULE_SIZE_INDEX_4K;
+        }
+
+        /* Use TTBR1 for GVA to IPA translation. */
+        *ttbr = READ_SYSREG64(TTBR1_EL1);
+
+        /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
+        disabled = tcr & TCR_EPD1;
+    }
+
+    return disabled;
+}
+#endif
+
+/*
+ * Get the MSB number of the GVA, according to "AddrTop" pseudocode
+ * implementation in ARM DDI 0487B-a J1-6066.

NIT: The spec is 0487B.a and not 0487B-a.

+ */
+static unsigned int get_top_bit(struct domain *d, vaddr_t gva, register_t tcr)
+{
+    unsigned int topbit;
+
+    /*
+     * IF EL1 is using AArch64 then addresses from EL0 using AArch32 are
+     * zero-extended to 64 bits (ARM DDI 0487B-a J1-6066).
+     */
+    if ( is_32bit_domain(d) )
+        topbit = 31;
+#ifdef CONFIG_ARM_64
+    else if ( is_64bit_domain(d) )
+    {
+        if ( ((gva & BIT(55)) && (tcr & TCR_EL1_TBI1)) ||
+             (!(gva & BIT(55)) && (tcr & TCR_EL1_TBI0)) )
+            topbit = 55;
+        else
+            topbit = 63;
+    }
+#endif
+
+    return topbit;
+}
+
 /*
  * The function guest_walk_ld translates a given GVA into an IPA using the
  * long-descriptor translation table format in software. This function assumes
@@ -43,8 +131,313 @@ static int guest_walk_ld(struct domain *d,
                          vaddr_t gva, paddr_t *ipa,
                          unsigned int *perms)
 {
-    /* Not implemented yet. */
-    return -EFAULT;
+    bool disabled = true;
+    bool ro_table = false, xn_table = false;
+    unsigned int t0_sz, t1_sz;
+    unsigned int level, gran;
+    unsigned int topbit = 0, input_size = 0, output_size;
+    uint64_t ttbr = 0, ips;
+    paddr_t mask;
+    lpae_t pte, *table;
+    struct page_info *page;
+    register_t tcr = READ_SYSREG(TCR_EL1);
+
+    const vaddr_t offsets[4][3] = {
+        {
+#ifdef CONFIG_ARM_64
+            zeroeth_guest_table_offset_4k(gva),
+            zeroeth_guest_table_offset_16k(gva),
+            0, /* There is no zeroeth lookup level with a 64K granule size. */
+#endif
+        },
+        {
+            first_guest_table_offset_4k(gva),
+#ifdef CONFIG_ARM_64
+            first_guest_table_offset_16k(gva),
+            first_guest_table_offset_64k(gva),
+#endif
+        },
+        {
+            second_guest_table_offset_4k(gva),
+#ifdef CONFIG_ARM_64
+            second_guest_table_offset_16k(gva),
+            second_guest_table_offset_64k(gva),
+#endif
+        },
+        {
+            third_guest_table_offset_4k(gva),
+#ifdef CONFIG_ARM_64
+            third_guest_table_offset_16k(gva),
+            third_guest_table_offset_64k(gva),
+#endif
+        }
+    };
+
+    static const paddr_t masks[4][3] = {
+        {
+            zeroeth_size(4K) - 1,
+            zeroeth_size(16K) - 1,
+            0 /* There is no zeroeth lookup level with a 64K granule size. */
+        },
+        {
+            first_size(4K) - 1,
+            first_size(16K) - 1,
+            first_size(64K) - 1
+        },
+        {
+            second_size(4K) - 1,
+            second_size(16K) - 1,
+            second_size(64K) - 1
+        },
+        {
+            third_size(4K) - 1,
+            third_size(16K) - 1,
+            third_size(64K) - 1
+        }
+    };
+
+    static const unsigned int grainsizes[3] = {
+        PAGE_SHIFT_4K,
+        PAGE_SHIFT_16K,
+        PAGE_SHIFT_64K
+    };
+
+    t0_sz = (tcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
+    t1_sz = (tcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
+
+    /* Get the MSB number of the GVA. */
+    topbit = get_top_bit(d, gva, tcr);
+
+#ifdef CONFIG_ARM_64
+    if ( is_64bit_domain(d) )
+    {
+        /* Select the TTBR(0|1)_EL1 that will be used for address translation. 
*/
+
+        if ( (gva & BIT(topbit)) == 0 )
+        {
+            input_size = REGISTER_WIDTH_64_BIT - t0_sz;
+
+            /* Get TTBR0 and configured page granularity. */
+            disabled = get_ttbr_and_gran_64bit(&ttbr, &gran, tcr, TTBR0_VALID);
+        }
+        else
+        {
+            input_size = REGISTER_WIDTH_64_BIT - t1_sz;
+
+            /* Get TTBR1 and configured page granularity. */
+            disabled = get_ttbr_and_gran_64bit(&ttbr, &gran, tcr, TTBR1_VALID);
+        }
+
+        /*
+         * The current implementation supports intermediate physical address
+         * sizes (IPS) up to 48 bit.
+         *
+         * XXX: Determine whether the IPS_MAX_VAL is 48 or 52 in software.
+         */
+        if ( (input_size > TCR_EL1_IPS_48_BIT_VAL) ||
+             (input_size < TCR_EL1_IPS_MIN_VAL) )
+            return -EFAULT;
+    }
+    else
+#endif
+    {
+        /* Granule size of AArch32 architectures is always 4K. */
+        gran = GRANULE_SIZE_INDEX_4K;
+
+        /* Select the TTBR(0|1)_EL1 that will be used for address translation. 
*/
+
+        /*
+         * Check if the bits <31:32-t0_sz> of the GVA are set to 0 (DDI 0487B-a
+         * J1-5999). If so, TTBR0 shall be used for address translation.
+         */
+        mask = ((1ULL << REGISTER_WIDTH_32_BIT) - 1) &
+               ~((1ULL << (REGISTER_WIDTH_32_BIT - t0_sz)) - 1);
+
+        if ( t0_sz == 0 || !(gva & mask) )
+        {
+            input_size = REGISTER_WIDTH_32_BIT - t0_sz;
+
+            /* Use TTBR0 for GVA to IPA translation. */
+            ttbr = READ_SYSREG64(TTBR0_EL1);
+
+            /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
+            disabled = tcr & TCR_EPD0;
+        }
+
+        /*
+         * Check if the bits <31:32-t1_sz> of the GVA are set to 1 (DDI 0487B-a
+         * J1-6000). If so, TTBR1 shall be used for address translation.
+         */
+        mask = ((1ULL << REGISTER_WIDTH_32_BIT) - 1) &
+               ~((1ULL << (REGISTER_WIDTH_32_BIT - t1_sz)) - 1);
+
+        if ( ((t1_sz == 0) && !ttbr) || (t1_sz && (gva & mask) == mask) )
+        {
+            input_size = REGISTER_WIDTH_32_BIT - t1_sz;
+
+            /* Use TTBR1 for GVA to IPA translation. */
+            ttbr = READ_SYSREG64(TTBR1_EL1);
+
+            /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
+            disabled = tcr & TCR_EPD1;
+        }
+    }
+
+    if ( disabled )
+        return -EFAULT;
+
+    /*
+     * The starting level is the number of strides (grainsizes[gran] - 3)
+     * needed to consume the input address (DDI 0487B-a J1-5924).
+     */
+    level = 4 - DIV_ROUND_UP((input_size - grainsizes[gran]), 
(grainsizes[gran] - 3));
+
+    if ( is_64bit_domain(d) )

Here an example where you don't use #ifdef.

+    {
+        /* Get the intermediate physical address size. */
+        ips = (tcr & TCR_EL1_IPS_MASK) >> TCR_EL1_IPS_SHIFT;
+
+        switch (ips)
+        {
+        case TCR_EL1_IPS_32_BIT:
+            output_size = TCR_EL1_IPS_32_BIT_VAL;
+            break;
+        case TCR_EL1_IPS_36_BIT:
+            output_size = TCR_EL1_IPS_36_BIT_VAL;
+            break;
+        case TCR_EL1_IPS_40_BIT:
+            output_size = TCR_EL1_IPS_40_BIT_VAL;
+            break;
+        case TCR_EL1_IPS_42_BIT:
+            output_size = TCR_EL1_IPS_42_BIT_VAL;
+            break;
+        case TCR_EL1_IPS_44_BIT:
+            output_size = TCR_EL1_IPS_44_BIT_VAL;
+            break;
+        case TCR_EL1_IPS_48_BIT:
+            output_size = TCR_EL1_IPS_48_BIT_VAL;
+            break;
+        case TCR_EL1_IPS_52_BIT:
+            /* XXX: 52 bit output_size is not supported yet. */

If you bail out here, you should also bail out for all reserved output size.

+            return -EFAULT;
+        default:
+            output_size = TCR_EL1_IPS_48_BIT_VAL;
+        }

Looking at this. Why not using an array to store all the values? This would avoid a repetitive switch here.

+    }
+    else
+        output_size = TCR_EL1_IPS_40_BIT_VAL;
+
+    /* Make sure the base address does not exceed its configured size. */
+    mask = ((1ULL << TCR_EL1_IPS_48_BIT_VAL) - 1) & ~((1ULL << output_size) - 
1);
+    if ( output_size < TCR_EL1_IPS_48_BIT_VAL && (ttbr & mask) )
+        return -EFAULT;
+
+    mask = ((1ULL << output_size) - 1);
+    page = get_page_from_gfn(d, paddr_to_pfn(ttbr & mask), NULL, P2M_ALLOC);
+    if ( !page )
+        return -EFAULT;
+
+    table = __map_domain_page(page);
+
+    for ( ; ; level++ )
+    {
+        pte = table[offsets[level][gran]];
+
+        unmap_domain_page(table);
+        put_page(page);
+
+        /* Make sure the base address does not exceed its configured size. */
+        mask = ((1ULL << TCR_EL1_IPS_48_BIT_VAL) - 1) &
+               ~((1ULL << output_size) - 1);
+
+        if ( (output_size < TCR_EL1_IPS_48_BIT_VAL) &&
+             (pfn_to_paddr(pte.walk.base) & mask) )
+            return -EFAULT;
+
+#ifdef CONFIG_ARM_64
+        /*
+         * If page granularity is 64K, make sure the address is aligned
+         * appropriately.
+         */
+        if ( (output_size < TCR_EL1_IPS_52_BIT_VAL) &&
+             (gran == GRANULE_SIZE_INDEX_64K) &&
+             (pte.walk.base & 0xf) )
+            return -EFAULT;
+#endif
+
+        /*
+         * Break if one of the following conditions are true:
+         *
+         * - We have found the PTE holding the IPA (level == 3).
+         * - The PTE is not valid.
+         * - If (level < 3) and the PTE is valid, we found a block descriptor.
+         */
+        if ( level == 3 || !p2m_valid(pte) || !p2m_table(pte) )

Please have a look at lpae_is_superpage(...).

+            break;
+
+        /*
+         * Temporarily store permissions of the table descriptor as they are
+         * inherited by page table attributes (ARM DDI 0487B-a J1-5928).
+         */
+        xn_table = xn_table | !!(pte.pt.xnt);          /* Execute-Never */

You can simplify with.

xn_table |= !!(pte.pt_xnt);

Potentially you can even drop the !!.


+        ro_table = ro_table | !!(pte.pt.apt & BIT(1)); /* Read-Only */

Ditto.

+
+#ifdef CONFIG_ARM_64
+        if ( output_size == TCR_EL1_IPS_52_BIT_VAL )
+        {
+            unsigned long gfn;
+
+            /*
+             * The GFN must be rearranged according to the following format of
+             * the PTE bits [pte<15:12>:pte<47:16>:0000].
+             */
+            gfn = ((unsigned long)(pte.walk.base & 0xf) << 36) |
+                  (pte.walk.base & ~0xf);
+
+            page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+        }
+        else
+#endif
+            page = get_page_from_gfn(d, pte.walk.base, NULL, P2M_ALLOC);
+
+        if ( !page )
+            return -EFAULT;
+
+        table = __map_domain_page(page);
+    }
+
+    if ( !p2m_valid(pte) || ((level == 3) && !p2m_table(pte)) )

Please explain at least the second part of the || in comment.

+        return -EFAULT;
+
+#ifdef CONFIG_ARM_64
+    if ( output_size == TCR_EL1_IPS_52_BIT_VAL )
+    {
+        unsigned long gfn;
+
+        /*
+         * The GFN must be rearranged according to the following format of the
+         * PTE bits [pte<15:12>:pte<47:16>:0000].
+         */
+        gfn = ((unsigned long)(pte.walk.base & 0xf) << 36) |
+              (pte.walk.base & ~0xf);
+
+        *ipa = pfn_to_paddr(gfn) | (gva & masks[level][gran]);
+    }
+    else
+#endif
+        *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level][gran]);
+
+    /*
+     * Set permissions so that the caller can check the flags by herself. Note
+     * that stage 1 translations also inherit attributes from the tables
+     * (ARM DDI 0487B-a J1-5928).
+     */
+    if ( !pte.pt.ro && !ro_table )
+        *perms = GV2M_WRITE;
+    if ( !pte.pt.xn && !xn_table )
+        *perms |= GV2M_EXEC;
+
+    return 0;
 }

 int guest_walk_tables(const struct vcpu *v, vaddr_t gva,
diff --git a/xen/include/asm-arm/guest_walk.h b/xen/include/asm-arm/guest_walk.h
index 4ed8476e08..ce01f0fa08 100644
--- a/xen/include/asm-arm/guest_walk.h
+++ b/xen/include/asm-arm/guest_walk.h
@@ -1,6 +1,15 @@
 #ifndef _XEN_GUEST_WALK_H
 #define _XEN_GUEST_WALK_H

+/* Normalized page granule size indices. */
+#define GRANULE_SIZE_INDEX_4K               (0)
+#define GRANULE_SIZE_INDEX_16K              (1)
+#define GRANULE_SIZE_INDEX_64K              (2)

Why this is exported? You only use them internally.

+
+/* Represent whether TTBR0 or TTBR1 is valid. */
+#define TTBR0_VALID                         (0)
+#define TTBR1_VALID                         (1)

Ditto.

+
 /* Walk the guest's page tables in software. */
 int guest_walk_tables(const struct vcpu *v,
                       vaddr_t gva,


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.