[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 4/4] arm/mem_access: Add software guest-page-table walk



Hi Sergej,

On 30/04/17 20:48, Sergej Proskurin wrote:
The function p2m_mem_access_check_and_get_page in mem_access.c
translates a gva to an ipa by means of the hardware functionality
implemented in the function gva_to_ipa. If mem_access is active,
hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
guest's translation tables, access to which might be restricted by the
active VTTBR. To address this issue, we perform the gva to ipa
translation in software.

Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx>
---
Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Julien Grall <julien.grall@xxxxxxx>
---
 xen/arch/arm/mem_access.c | 140 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 139 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
index 04b1506b00..352eb6073f 100644
--- a/xen/arch/arm/mem_access.c
+++ b/xen/arch/arm/mem_access.c
@@ -20,6 +20,7 @@
 #include <xen/monitor.h>
 #include <xen/sched.h>
 #include <xen/vm_event.h>
+#include <xen/domain_page.h>
 #include <public/vm_event.h>
 #include <asm/event.h>

@@ -90,6 +91,129 @@ static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
     return 0;
 }

+static int
+p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva,
+               paddr_t *ipa, unsigned int flags)

I don't think this function belongs to mem_access.c. It would be better in p2m.c. Also, the name is a bit confusing, a user would not know the difference between p2m_gva_to_ipa and gva_to_ipa.

+{
+    int level=0, t0_sz, t1_sz;

Coding style level = 0.

Also please use unsigned int for level.

t0_sz and t1_sz should be stay signed.

+    unsigned long t0_max, t1_min;
+    lpae_t pte, *table;
+    mfn_t root_mfn;
+    uint64_t ttbr;
+    uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
+    register_t ttbcr = READ_SYSREG(TCR_EL1);

So you are assuming that the current vCPU is running, right? If so, this should be clarified in the commit message and in a comment above the function.

Also s/ttbcr/tcr/

+    struct domain *d = p2m->domain;
+
+    const unsigned int offsets[4] = {
+#ifdef CONFIG_ARM_64
+        zeroeth_table_offset(gva),
+#endif
+        first_table_offset(gva),
+        second_table_offset(gva),
+        third_table_offset(gva)
+    };

Offsets are based on the granularity of Xen (currently 4KB). However the guests can support 4KB, 16KB, 64KB. Please handle everything correctly.

+
+    const paddr_t masks[4] = {
+#ifdef CONFIG_ARM_64
+        ZEROETH_SIZE - 1,
+#endif
+        FIRST_SIZE - 1,
+        SECOND_SIZE - 1,
+        THIRD_SIZE - 1
+    };
+
+    /* If the MMU is disabled, there is no need to translate the gva. */
+    if ( !(sctlr & SCTLR_M) )
+    {
+        *ipa = gva;
+
+        return 0;
+    }
+
+    if ( is_32bit_domain(d) )
+    {
+        /*
+         * XXX: We do not support 32-bit domain translation table walks for
+         * domains using the short-descriptor translation table format, yet.
+         */

Debian ARM 32bit is using short-descriptor translation table. So are you suggesting that memaccess will not correctly with Debian guest?

+        if ( !(ttbcr & TTBCR_EAE) )

See my comment on patch #2 about the naming convention.

+            return -EFAULT;
+
+#ifdef CONFIG_ARM_64
+        level = 1;

This sounds wrong to me. The first lookup level is determined by the value of T0SZ and T1SZ (see B3-1352 in ARM DDI 0406C.c).

For the AArch64 version see D4.2.5 in ARM DDI 0487A.k_iss10775.

+#endif
+    }
+
+#ifdef CONFIG_ARM_64
+    if ( is_64bit_domain(d) )
+    {
+        /* Get the max GVA that can be translated by TTBR0. */
+        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
+        t0_max = (1UL << (64 - t0_sz)) - 1;
+
+        /* Get the min GVA that can be translated by TTBR1. */
+        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
+        t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1;
+    }
+    else
+#endif
+    {
+        /* Get the max GVA that can be translated by TTBR0. */
+        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK;

See my comment on patch #2 for the naming convention.

+        t0_max = (1U << (32 - t0_sz)) - 1;
+
+        /* Get the min GVA that can be translated by TTBR1. */
+        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK;
+        t1_min = ~0U - (1U << (32 - t1_sz)) + 1;

1U << (32 - t1_sz) will not fit in an unsigned long if t1_sz = 0.

Also, this code looks wrong to me. Looking at B3.6.4 in DDI 0406C.c, you don't handle properly t0_max and t1_min when T0SZ or T1SZ is 0.

I think it would be worth for you to have a look to the pseudo-code in the ARM ARM for TranslationTableWalk (see B3.19.3 in ARM DDI 0406C.c and J1-5295 in ARM DDI 0487A.k_iss10775) which gives you all the details for doing properly translation table walk.

+    }
+
+    if ( t0_max >= gva )
+        /* Use TTBR0 for GVA to IPA translation. */
+        ttbr = READ_SYSREG64(TTBR0_EL1);
+    else if ( t1_min <= gva )
+        /* Use TTBR1 for GVA to IPA translation. */
+        ttbr = READ_SYSREG64(TTBR1_EL1);
+    else
+        /* GVA out of bounds of TTBR(0|1). */
+        return -EFAULT;
+
+    /* Bits [63..48] might be used by an ASID. */
+    root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr & ((1ULL<<48)-1))), NULL);

Please don't hardcode the mask.

+
+    /* Check, whether TTBR holds a valid address. */
+    if ( mfn_eq(root_mfn, INVALID_MFN) )
+        return -EFAULT;
+
+    table = map_domain_page(root_mfn);

Nothing prevents the guest to give back the page table whilst you browsing it. You may want to have a look to patch "ARM: introduce vgic_access_guest_memory()" [1] to generalize the function and avoid re-inventing the wheel.

+
+    for ( ; ; level++ )
+    {
+        pte = table[offsets[level]];

You likely want to use ACCESS_ONCE here to prevent the compiler to read the pte twice from the memory.

+
+        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
+            break;
+
+        unmap_domain_page(table);
+
+        root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL);

No check on root_mfn?

+        table = map_domain_page(root_mfn);
+    }
+
+    unmap_domain_page(table);
+
+    if ( !pte.walk.valid )
+        return -EFAULT;
+
+    /* Make sure the entry holds the requested access attributes. */
+    if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro )
+        return -EFAULT;

IHMO, this function should return the access attribute of the page and let the caller decides what to do. This would make this function more generic.

+
+    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
+
+    return 0;
+}
+
+
 /*
  * If mem_access is in use it might have been the reason why get_page_from_gva
  * failed to fetch the page, as it uses the MMU for the permission checking.
@@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag,
     struct page_info *page = NULL;
     struct p2m_domain *p2m = &v->domain->arch.p2m;

+    ASSERT(p2m->mem_access_enabled);
+

Why this ASSERT has been added?

     rc = gva_to_ipa(gva, &ipa, flag);
+
+    /*
+     * In case mem_access is active, hardware-based gva_to_ipa translation
+     * might fail. Since gva_to_ipa uses the guest's translation tables, access
+     * to which might be restricted by the active VTTBR, we perform a gva to
+     * ipa translation in software.
+     */
     if ( rc < 0 )
-        goto err;
+        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
+            /*
+             * The software gva to ipa translation can still fail, if the the
+             * gva is not mapped or does not hold the requested access rights.
+             */
+            goto err;

Rather falling back, why don't we do software page table walk every time?


     gfn = _gfn(paddr_to_pfn(ipa));



Cheers,

[1] https://patchwork.kernel.org/patch/9676291/

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.