[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [for-4.8][PATCH v2 17/23] xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry



The ARM architecture mandates to use of a break-before-make sequence
when changing translation entries if the page table is shared between
multiple CPUs whenever a valid entry is replaced by another valid entry
(see D4.7.1 in ARM DDI 0487A.j for more details).

The break-before-make sequence can be divided in the following steps:
    1) Invalidate the old entry in the page table
    2) Issue a TLB invalidation instruction for the address associated
    to this entry
    3) Write the new entry

The current P2M code implemented in apply_one_level does not respect
this sequence and may result to break coherency on some processors.

Adapting the current implementation to use the break-before-make
sequence would imply some code duplication and more TLBs invalidation
than necessary. For instance, if we are replacing a 4KB page and the
current mapping in the P2M is using a 1GB superpage, the following steps
will happen:
    1) Shatter the 1GB superpage into a series of 2MB superpages
    2) Shatter the 2MB superpage into a series of 4KB pages
    3) Replace the 4KB page

As the current implementation is shattering while descending and install
the mapping, Xen would need to issue 3 TLB invalidation instructions
which is clearly inefficient.

Furthermore, all the operations which modify the page table are using
the same skeleton. It is more complicated to maintain different code paths
than having a generic function that set an entry and take care of the
break-before-make sequence.

The new implementation is based on the x86 EPT one which, I think,
fits quite well for the break-before-make sequence whilst keeping
the code simple.

The main function of the new implementation is __p2m_set_entry. It will
only work on mapping that are aligned to a block entry in the page table
(i.e 1GB, 2MB, 4KB when using a 4KB granularity).

Another function, p2m_set_entry, is provided to break down is region
into mapping that is aligned to a block entry or 4KB when memaccess is
enabled.

Signed-off-by: Julien Grall <julien.grall@xxxxxxx>

---
    Changes in v2:
        - fix typoes in the commit message
        - don't use the default access when shattering the superpage
        - remove duplicated comment
        - export p2m_set_entry
        - s/todo/nr/
        - iommu flush
        - update the stats when removing/adding a mapping
        - update doc
        - fix p2m_split_superpage
        - don't try to allocate intermediate page table when removing an
        entry
        - the TLB flush is not necessary when only permission are
        changed (e.g memaccess).
---
 xen/arch/arm/p2m.c         | 374 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/p2m.h  |  11 ++
 xen/include/asm-arm/page.h |   4 +
 3 files changed, 389 insertions(+)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 3ca2e90..5f7aef0 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -783,6 +783,380 @@ static void p2m_put_l3_page(const lpae_t pte)
     }
 }
 
+/* Free lpae sub-tree behind an entry */
+static void p2m_free_entry(struct p2m_domain *p2m,
+                           lpae_t entry, unsigned int level)
+{
+    unsigned int i;
+    lpae_t *table;
+    mfn_t mfn;
+
+    /* Nothing to do if the entry is invalid. */
+    if ( !p2m_valid(entry) )
+        return;
+
+    /* Nothing to do but updating the states if the entry is a super-page. */
+    if ( p2m_is_superpage(entry, level) )
+    {
+        p2m->stats.mappings[level]--;
+        return;
+    }
+
+    if ( level == 3 )
+    {
+        p2m->stats.mappings[level]--;
+        p2m_put_l3_page(entry);
+        return;
+    }
+
+    table = map_domain_page(_mfn(entry.p2m.base));
+    for ( i = 0; i < LPAE_ENTRIES; i++ )
+        p2m_free_entry(p2m, *(table + i), level + 1);
+
+    unmap_domain_page(table);
+
+    /*
+     * Make sure all the references in the TLB have been removed before
+     * freing the intermediate page table.
+     * XXX: Should we defer the free of the page table to avoid the
+     * flush?
+     */
+    if ( p2m->need_flush )
+        p2m_flush_tlb_sync(p2m);
+
+    mfn = _mfn(entry.p2m.base);
+    ASSERT(mfn_valid(mfn_x(mfn)));
+
+    free_domheap_page(mfn_to_page(mfn_x(mfn)));
+}
+
+static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
+                                unsigned int level, unsigned int target,
+                                const unsigned int *offsets)
+{
+    struct page_info *page;
+    unsigned int i;
+    lpae_t pte, *table;
+    bool rv = true;
+
+    /* Convenience aliases */
+    mfn_t mfn = _mfn(entry->p2m.base);
+    unsigned int next_level = level + 1;
+    unsigned int level_order = level_orders[next_level];
+
+    /*
+     * This should only be called with target != level and the entry is
+     * a superpage.
+     */
+    ASSERT(level < target);
+    ASSERT(p2m_is_superpage(*entry, level));
+
+    page = alloc_domheap_page(NULL, 0);
+    if ( !page )
+        return false;
+
+    page_list_add(page, &p2m->pages);
+    table = __map_domain_page(page);
+
+    /*
+     * We are either splitting a first level 1G page into 512 second level
+     * 2M pages, or a second level 2M page into 512 third level 4K pages.
+     */
+    for ( i = 0; i < LPAE_ENTRIES; i++ )
+    {
+        lpae_t *new_entry = table + i;
+
+        /*
+         * Use the content of the superpage entry and override
+         * the necessary fields. So the correct permission are kept.
+         */
+        pte = *entry;
+        pte.p2m.base = mfn_x(mfn_add(mfn, i << level_order));
+
+        /*
+         * First and second level pages set p2m.table = 0, but third
+         * level entries set p2m.table = 1.
+         */
+        pte.p2m.table = (next_level == 3);
+
+        write_pte(new_entry, pte);
+    }
+
+    /* Update stats */
+    p2m->stats.shattered[level]++;
+    p2m->stats.mappings[level]--;
+    p2m->stats.mappings[next_level] += LPAE_ENTRIES;
+
+    /*
+     * Shatter superpage in the page to the level we want to make the
+     * changes.
+     * This is done outside the loop to avoid checking the offset to
+     * know whether the entry should be shattered for every entry.
+     */
+    if ( next_level != target )
+        rv = p2m_split_superpage(p2m, table + offsets[next_level],
+                                 level + 1, target, offsets);
+
+    if ( p2m->clean_pte )
+        clean_dcache_va_range(table, PAGE_SIZE);
+
+    unmap_domain_page(table);
+
+    pte = mfn_to_p2m_entry(_mfn(page_to_mfn(page)), p2m_invalid,
+                           p2m->default_access);
+
+    /*
+     * Even if we failed, we should install the newly allocated LPAE
+     * entry. The caller will be in charge to free the sub-tree.
+     */
+    p2m_write_pte(entry, pte, p2m->clean_pte);
+
+    return rv;
+}
+
+/*
+ * Insert an entry in the p2m. This should be called with a mapping
+ * equal to a page/superpage (4K, 2M, 1G).
+ */
+static int __p2m_set_entry(struct p2m_domain *p2m,
+                           gfn_t sgfn,
+                           unsigned int page_order,
+                           mfn_t smfn,
+                           p2m_type_t t,
+                           p2m_access_t a)
+{
+    paddr_t addr = pfn_to_paddr(gfn_x(sgfn));
+    unsigned int level = 0;
+    unsigned int target = 3 - (page_order / LPAE_SHIFT);
+    lpae_t *entry, *table, orig_pte;
+    int rc;
+
+    /* Convenience aliases */
+    const unsigned int offsets[4] = {
+        zeroeth_table_offset(addr),
+        first_table_offset(addr),
+        second_table_offset(addr),
+        third_table_offset(addr)
+    };
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    /*
+     * Check if the level target is valid: we only support
+     * 4K - 2M - 1G mapping.
+     */
+    ASSERT(target > 0 && target <= 3);
+
+    table = p2m_get_root_pointer(p2m, sgfn);
+    if ( !table )
+        return -EINVAL;
+
+    for ( level = P2M_ROOT_LEVEL; level < target; level++ )
+    {
+        /*
+         * Don't try to allocate intermediate page table if the mapping
+         * is about to be removed (i.e mfn == INVALID_MFN).
+         */
+        rc = p2m_next_level(p2m, mfn_eq(smfn, INVALID_MFN),
+                            &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+        {
+            /*
+             * We are here because p2m_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and they p2m tree is read-only). It is a valid case
+             * when removing a mapping as it may not exist in the
+             * page table. In this case, just ignore it.
+             */
+            rc = mfn_eq(smfn, INVALID_MFN) ? 0 : -ENOENT;
+            goto out;
+        }
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
+    }
+
+    entry = table + offsets[level];
+
+    /*
+     * If we are here with level < target, we must be at a leaf node,
+     * and we need to break up the superpage.
+     */
+    if ( level < target )
+    {
+        /* We need to split the original page. */
+        lpae_t split_pte = *entry;
+
+        ASSERT(p2m_is_superpage(*entry, level));
+
+        if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
+        {
+            /*
+             * The current super-page is still in-place, so re-increment
+             * the stats.
+             */
+            p2m->stats.mappings[level]++;
+
+            /* Free the allocated sub-tree */
+            p2m_free_entry(p2m, split_pte, level);
+
+            rc = -ENOMEM;
+            goto out;
+        }
+
+        /*
+         * Follow the break-before-sequence to update the entry.
+         * For more details see (D4.7.1 in ARM DDI 0487A.j).
+         */
+        p2m_remove_pte(entry, p2m->clean_pte);
+        p2m_flush_tlb_sync(p2m);
+
+        p2m_write_pte(entry, split_pte, p2m->clean_pte);
+
+        /* then move to the level we want to make real changes */
+        for ( ; level < target; level++ )
+        {
+            rc = p2m_next_level(p2m, true, &table, offsets[level]);
+
+            /*
+             * The entry should be found and either be a table
+             * or a superpage if level 3 is not targeted
+             */
+            ASSERT(rc == GUEST_TABLE_NORMAL_PAGE ||
+                   (rc == GUEST_TABLE_SUPER_PAGE && target < 3));
+        }
+
+        entry = table + offsets[level];
+    }
+
+    /*
+     * We should always be there with the correct level because
+     * all the intermediate tables have been installed if necessary.
+     */
+    ASSERT(level == target);
+
+    orig_pte = *entry;
+
+    /*
+     * The radix-tree can only work on 4KB. This is only used when
+     * memaccess is enabled.
+     */
+    ASSERT(!p2m->mem_access_enabled || page_order == 0);
+    /*
+     * The access type should always be p2m_access_rwx when the mapping
+     * is removed.
+     */
+    ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx));
+    /*
+     * Update the mem access permission before update the P2M. So we
+     * don't have to revert the mapping if it has failed.
+     */
+    rc = p2m_mem_access_radix_set(p2m, sgfn, a);
+    if ( rc )
+        goto out;
+
+    /*
+     * Always remove the entry in order to follow the break-before-make
+     * sequence when updating the translation table (D4.7.1 in ARM DDI
+     * 0487A.j).
+     */
+    if ( p2m_valid(orig_pte) )
+        p2m_remove_pte(entry, p2m->clean_pte);
+
+    if ( mfn_eq(smfn, INVALID_MFN) )
+        /* Flush can be deferred if the entry is removed */
+        p2m->need_flush |= !!p2m_valid(orig_pte);
+    else
+    {
+        lpae_t pte = mfn_to_p2m_entry(smfn, t, a);
+
+        if ( level < 3 )
+            pte.p2m.table = 0; /* Superpage entry */
+
+        /*
+         * It is necessary to flush the TLB before writing the new entry
+         * to keep coherency when the previous entry was valid.
+         *
+         * Although, it could be defered when only the permissions are
+         * changed (e.g in case of memaccess).
+         */
+        if ( p2m_valid(orig_pte) )
+        {
+            if ( likely(!p2m->mem_access_enabled) ||
+                 P2M_CLEAR_PERM(pte) != P2M_CLEAR_PERM(orig_pte) )
+                p2m_flush_tlb_sync(p2m);
+            else
+                p2m->need_flush = true;
+        }
+        else /* new mapping */
+            p2m->stats.mappings[level]++;
+
+        p2m_write_pte(entry, pte, p2m->clean_pte);
+
+        p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
+                                      gfn_add(sgfn, 1 << page_order));
+        p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn);
+    }
+
+    /*
+     * Free the entry only if the original pte was valid and the base
+     * is different (to avoid freeing when permission is changed).
+     */
+    if ( p2m_valid(orig_pte) && entry->p2m.base != orig_pte.p2m.base )
+        p2m_free_entry(p2m, orig_pte, level);
+
+    if ( need_iommu(p2m->domain) && (p2m_valid(orig_pte) || p2m_valid(*entry)) 
)
+        rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << page_order);
+    else
+        rc = 0;
+
+out:
+    unmap_domain_page(table);
+
+    return rc;
+}
+
+int p2m_set_entry(struct p2m_domain *p2m,
+                  gfn_t sgfn,
+                  unsigned long nr,
+                  mfn_t smfn,
+                  p2m_type_t t,
+                  p2m_access_t a)
+{
+    int rc = 0;
+
+    while ( nr )
+    {
+        /*
+         * XXX: Support superpage mappings if nr is not aligned to a
+         * superpage size.
+         */
+        unsigned long mask = gfn_x(sgfn) | mfn_x(smfn) | nr;
+        unsigned long order;
+
+        /* Always map 4k by 4k when memaccess is enabled */
+        if ( unlikely(p2m->mem_access_enabled) )
+            order = THIRD_ORDER;
+        else if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
+            order = FIRST_ORDER;
+        else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
+            order = SECOND_ORDER;
+        else
+            order = THIRD_ORDER;
+
+        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
+        if ( rc )
+            break;
+
+        sgfn = gfn_add(sgfn, (1 << order));
+        if ( !mfn_eq(smfn, INVALID_MFN) )
+           smfn = mfn_add(smfn, (1 << order));
+
+        nr -= (1 << order);
+    }
+
+    return rc;
+}
+
 /*
  * Returns true if start_gpaddr..end_gpaddr contains at least one
  * suitably aligned level_size mappping of maddr.
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 6fe6a37..cca86ef 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -187,6 +187,17 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
                     p2m_type_t *t, p2m_access_t *a,
                     unsigned int *page_order);
 
+/*
+ * Direct set a p2m entry: only for use by the P2M code.
+ * The P2M write lock should be taken.
+ */
+int p2m_set_entry(struct p2m_domain *p2m,
+                  gfn_t sgfn,
+                  unsigned long nr,
+                  mfn_t smfn,
+                  p2m_type_t t,
+                  p2m_access_t a);
+
 /* Clean & invalidate caches corresponding to a region of guest address space 
*/
 int p2m_cache_flush(struct domain *d, gfn_t start, unsigned long nr);
 
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index a43b0fa..ba63322 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -166,6 +166,10 @@ typedef struct __packed {
     unsigned long sbz1:5;
 } lpae_p2m_t;
 
+/* Permission mask: xn, write, read */
+#define P2M_PERM_MASK (0x00400000000000C0ULL)
+#define P2M_CLEAR_PERM(pte) ((pte).bits & ~P2M_PERM_MASK)
+
 /*
  * Walk is the common bits of p2m and pt entries which are needed to
  * simply walk the table (e.g. for debug).
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.