[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c



Move the following PV specific code to the new file:

1. Several hypercalls that are tied to PV:
   1. do_mmuext_op
   2. do_mmu_update
   3. do_update_va_mapping
   4. do_update_va_mapping_otherdomain
   5. do_set_gdt
   6. do_update_descriptor
2. PV MMIO emulation code
3. PV writable page table emulation code
4. PV grant table mapping creation / destruction code
5. Other supporting code for the above items

Move everything in one patch because they share a lot of code. Also move
the PV page table API comment to the new file. Remove all trailing
white spaces.

Due to the code movement, a few functions are exported via relevant
header files. Some configuration variables are made non-static.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
---
 xen/arch/x86/mm.c                 | 4964 ++++---------------------------------
 xen/arch/x86/pv/Makefile          |    1 +
 xen/arch/x86/pv/mm.c              | 4118 ++++++++++++++++++++++++++++++
 xen/include/asm-x86/grant_table.h |    4 +
 xen/include/asm-x86/mm.h          |    9 +
 xen/include/xen/mm.h              |    1 +
 6 files changed, 4581 insertions(+), 4516 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e1ce77b9ac..169ae7e4a1 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -18,71 +18,6 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * A description of the x86 page table API:
- * 
- * Domains trap to do_mmu_update with a list of update requests.
- * This is a list of (ptr, val) pairs, where the requested operation
- * is *ptr = val.
- * 
- * Reference counting of pages:
- * ----------------------------
- * Each page has two refcounts: tot_count and type_count.
- * 
- * TOT_COUNT is the obvious reference count. It counts all uses of a
- * physical page frame by a domain, including uses as a page directory,
- * a page table, or simple mappings via a PTE. This count prevents a
- * domain from releasing a frame back to the free pool when it still holds
- * a reference to it.
- * 
- * TYPE_COUNT is more subtle. A frame can be put to one of three
- * mutually-exclusive uses: it might be used as a page directory, or a
- * page table, or it may be mapped writable by the domain [of course, a
- * frame may not be used in any of these three ways!].
- * So, type_count is a count of the number of times a frame is being 
- * referred to in its current incarnation. Therefore, a page can only
- * change its type when its type count is zero.
- * 
- * Pinning the page type:
- * ----------------------
- * The type of a page can be pinned/unpinned with the commands
- * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
- * pinning is not reference counted, so it can't be nested).
- * This is useful to prevent a page's type count falling to zero, at which
- * point safety checks would need to be carried out next time the count
- * is increased again.
- * 
- * A further note on writable page mappings:
- * -----------------------------------------
- * For simplicity, the count of writable mappings for a page may not
- * correspond to reality. The 'writable count' is incremented for every
- * PTE which maps the page with the _PAGE_RW flag set. However, for
- * write access to be possible the page directory entry must also have
- * its _PAGE_RW bit set. We do not check this as it complicates the 
- * reference counting considerably [consider the case of multiple
- * directory entries referencing a single page table, some with the RW
- * bit set, others not -- it starts getting a bit messy].
- * In normal use, this simplification shouldn't be a problem.
- * However, the logic can be added if required.
- * 
- * One more note on read-only page mappings:
- * -----------------------------------------
- * We want domains to be able to map pages for read-only access. The
- * main reason is that page tables and directories should be readable
- * by a domain, but it would not be safe for them to be writable.
- * However, domains have free access to rings 1 & 2 of the Intel
- * privilege model. In terms of page protection, these are considered
- * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
- * read-only restrictions are respected in supervisor mode -- if the 
- * bit is clear then any mapped page is writable.
- * 
- * We get round this by always setting the WP bit and disallowing 
- * updates to it. This is very unlikely to cause a problem for guest
- * OS's, which will generally use the WP bit to simplify copy-on-write
- * implementation (in that case, OS wants a fault when it writes to
- * an application-supplied buffer).
- */
-
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -151,30 +86,9 @@ struct rangeset *__read_mostly mmio_ro_ranges;
 bool_t __read_mostly opt_allow_superpage;
 boolean_param("allowsuperpage", opt_allow_superpage);
 
-static void put_superpage(unsigned long mfn);
-
-static uint32_t base_disallow_mask;
-/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
-#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
-
-#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \
-                          ? base_disallow_mask & ~_PAGE_PSE \
-                          : base_disallow_mask)
-
-#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
-                             base_disallow_mask : 0xFFFFF198U)
-
-#define L4_DISALLOW_MASK (base_disallow_mask)
-
-#define l1_disallow_mask(d)                                     \
-    ((d != dom_io) &&                                           \
-     (rangeset_is_empty((d)->iomem_caps) &&                     \
-      rangeset_is_empty((d)->arch.ioport_caps) &&               \
-      !has_arch_pdevs(d) &&                                     \
-      is_pv_domain(d)) ?                                        \
-     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+uint32_t base_disallow_mask;
 
-static s8 __read_mostly opt_mmio_relax;
+s8 __read_mostly opt_mmio_relax;
 static void __init parse_mmio_relax(const char *s)
 {
     if ( !*s )
@@ -539,165 +453,7 @@ void update_cr3(struct vcpu *v)
     make_cr3(v, cr3_mfn);
 }
 
-/* Get a mapping of a PV guest's l1e for this virtual address. */
-static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
-{
-    l2_pgentry_t l2e;
-
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) )
-        return NULL;
-
-    /* Find this l1e and its enclosing l1mfn in the linear map. */
-    if ( __copy_from_user(&l2e,
-                          &__linear_l2_table[l2_linear_offset(addr)],
-                          sizeof(l2_pgentry_t)) )
-        return NULL;
-
-    /* Check flags that it will be safe to read the l1e. */
-    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
-        return NULL;
-
-    *gl1mfn = l2e_get_pfn(l2e);
-
-    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
-           l1_table_offset(addr);
-}
-
-/* Pull down the mapping we got from guest_map_l1e(). */
-static inline void guest_unmap_l1e(void *p)
-{
-    unmap_domain_page(p);
-}
-
-/* Read a PV guest's l1e that maps this virtual address. */
-static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
-{
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) ||
-         __copy_from_user(eff_l1e,
-                          &__linear_l1_table[l1_linear_offset(addr)],
-                          sizeof(l1_pgentry_t)) )
-        *eff_l1e = l1e_empty();
-}
-
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
-                                          void *eff_l1e)
-{
-    bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
-    TOGGLE_MODE();
-    guest_get_eff_l1e(addr, eff_l1e);
-    TOGGLE_MODE();
-}
-
-const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
-    zero_page[PAGE_SIZE];
-
-static void invalidate_shadow_ldt(struct vcpu *v, int flush)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    struct page_info *page;
-
-    BUG_ON(unlikely(in_irq()));
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
-        goto out;
-
-    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
-    pl1e = gdt_ldt_ptes(v->domain, v);
-
-    for ( i = 16; i < 32; i++ )
-    {
-        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
-            continue;
-        page = l1e_get_page(pl1e[i]);
-        l1e_write(&pl1e[i], l1e_empty());
-        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
-        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
-        put_page_and_type(page);
-    }
-
-    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
-    if ( flush )
-        flush_tlb_mask(v->vcpu_dirty_cpumask);
-
- out:
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-}
-
-
-static int alloc_segdesc_page(struct page_info *page)
-{
-    const struct domain *owner = page_get_owner(page);
-    struct desc_struct *descs = __map_domain_page(page);
-    unsigned i;
-
-    for ( i = 0; i < 512; i++ )
-        if ( unlikely(!check_descriptor(owner, &descs[i])) )
-            break;
-
-    unmap_domain_page(descs);
-
-    return i == 512 ? 0 : -EINVAL;
-}
-
-
-/* Map shadow page at offset @off. */
-int map_ldt_shadow_page(unsigned int off)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    unsigned long gmfn;
-    struct page_info *page;
-    l1_pgentry_t l1e, nl1e;
-    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
-    int okay;
-
-    BUG_ON(unlikely(in_irq()));
-
-    if ( is_pv_32bit_domain(d) )
-        gva = (u32)gva;
-    guest_get_eff_kern_l1e(v, gva, &l1e);
-    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
-        return 0;
-
-    gmfn = l1e_get_pfn(l1e);
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-    if ( unlikely(!page) )
-        return 0;
-
-    okay = get_page_type(page, PGT_seg_desc_page);
-    if ( unlikely(!okay) )
-    {
-        put_page(page);
-        return 0;
-    }
-
-    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
-    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    return 1;
-}
-
-
-static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 {
     struct page_info *page = mfn_to_page(page_nr);
 
@@ -712,11 +468,11 @@ static int get_page_from_pagenr(unsigned long page_nr, 
struct domain *d)
 }
 
 
-static int get_page_and_type_from_pagenr(unsigned long page_nr, 
-                                         unsigned long type,
-                                         struct domain *d,
-                                         int partial,
-                                         int preemptible)
+int get_page_and_type_from_pagenr(unsigned long page_nr,
+                                  unsigned long type,
+                                  struct domain *d,
+                                  int partial,
+                                  int preemptible)
 {
     struct page_info *page = mfn_to_page(page_nr);
     int rc;
@@ -736,72 +492,6 @@ static int get_page_and_type_from_pagenr(unsigned long 
page_nr,
     return rc;
 }
 
-static void put_data_page(
-    struct page_info *page, int writeable)
-{
-    if ( writeable )
-        put_page_and_type(page);
-    else
-        put_page(page);
-}
-
-/*
- * We allow root tables to map each other (a.k.a. linear page tables). It
- * needs some special care with reference counts and access permissions:
- *  1. The mapping entry must be read-only, or the guest may get write access
- *     to its own PTEs.
- *  2. We must only bump the reference counts for an *already validated*
- *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
- *     on a validation that is required to complete that validation.
- *  3. We only need to increment the reference counts for the mapped page
- *     frame if it is mapped by a different root table. This is sufficient and
- *     also necessary to allow validation of a root table mapping itself.
- */
-#define define_get_linear_pagetable(level)                                  \
-static int                                                                  \
-get_##level##_linear_pagetable(                                             \
-    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
-{                                                                           \
-    unsigned long x, y;                                                     \
-    struct page_info *page;                                                 \
-    unsigned long pfn;                                                      \
-                                                                            \
-    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
-    {                                                                       \
-        gdprintk(XENLOG_WARNING,                                            \
-                 "Attempt to create linear p.t. with write perms\n");       \
-        return 0;                                                           \
-    }                                                                       \
-                                                                            \
-    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
-    {                                                                       \
-        /* Make sure the mapped frame belongs to the correct domain. */     \
-        if ( unlikely(!get_page_from_pagenr(pfn, d)) )                      \
-            return 0;                                                       \
-                                                                            \
-        /*                                                                  \
-         * Ensure that the mapped frame is an already-validated page table. \
-         * If so, atomically increment the count (checking for overflow).   \
-         */                                                                 \
-        page = mfn_to_page(pfn);                                            \
-        y = page->u.inuse.type_info;                                        \
-        do {                                                                \
-            x = y;                                                          \
-            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
-                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
-                          (PGT_##level##_page_table|PGT_validated)) )       \
-            {                                                               \
-                put_page(page);                                             \
-                return 0;                                                   \
-            }                                                               \
-        }                                                                   \
-        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
-    }                                                                       \
-                                                                            \
-    return 1;                                                               \
-}
-
-
 bool is_iomem_page(mfn_t mfn)
 {
     struct page_info *page;
@@ -816,7 +506,7 @@ bool is_iomem_page(mfn_t mfn)
     return (page_get_owner(page) == dom_io);
 }
 
-static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
+int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
 {
     int err = 0;
     bool_t alias = mfn >= PFN_DOWN(xen_phys_start) &&
@@ -834,3414 +524,489 @@ static int update_xen_mappings(unsigned long mfn, 
unsigned int cacheattr)
     return err;
 }
 
-#ifndef NDEBUG
-struct mmio_emul_range_ctxt {
-    const struct domain *d;
-    unsigned long mfn;
-};
-
-static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
+bool_t fill_ro_mpt(unsigned long mfn)
 {
-    const struct mmio_emul_range_ctxt *ctxt = arg;
-
-    if ( ctxt->mfn > e )
-        return 0;
+    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
+    bool_t ret = 0;
 
-    if ( ctxt->mfn >= s )
+    if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) )
     {
-        static DEFINE_SPINLOCK(last_lock);
-        static const struct domain *last_d;
-        static unsigned long last_s = ~0UL, last_e;
-        bool_t print = 0;
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+        ret = 1;
+    }
+    unmap_domain_page(l4tab);
 
-        spin_lock(&last_lock);
-        if ( last_d != ctxt->d || last_s != s || last_e != e )
-        {
-            last_d = ctxt->d;
-            last_s = s;
-            last_e = e;
-            print = 1;
-        }
-        spin_unlock(&last_lock);
+    return ret;
+}
 
-        if ( print )
-            printk(XENLOG_G_INFO
-                   "d%d: Forcing write emulation on MFNs %lx-%lx\n",
-                   ctxt->d->domain_id, s, e);
-    }
+void zap_ro_mpt(unsigned long mfn)
+{
+    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
 
-    return 1;
+    l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+    unmap_domain_page(l4tab);
 }
-#endif
 
-int
-get_page_from_l1e(
-    l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner)
+int page_lock(struct page_info *page)
 {
-    unsigned long mfn = l1e_get_pfn(l1e);
-    struct page_info *page = mfn_to_page(mfn);
-    uint32_t l1f = l1e_get_flags(l1e);
-    struct vcpu *curr = current;
-    struct domain *real_pg_owner;
-    bool_t write;
-
-    if ( !(l1f & _PAGE_PRESENT) )
-        return 0;
+    unsigned long x, nx;
 
-    if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
-                 l1f & l1_disallow_mask(l1e_owner));
-        return -EINVAL;
-    }
+    do {
+        while ( (x = page->u.inuse.type_info) & PGT_locked )
+            cpu_relax();
+        nx = x + (1 | PGT_locked);
+        if ( !(x & PGT_validated) ||
+             !(x & PGT_count_mask) ||
+             !(nx & PGT_count_mask) )
+            return 0;
+    } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
 
-    if ( !mfn_valid(_mfn(mfn)) ||
-         (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
-    {
-        int flip = 0;
+    return 1;
+}
 
-        /* Only needed the reference to confirm dom_io ownership. */
-        if ( mfn_valid(_mfn(mfn)) )
-            put_page(page);
+void page_unlock(struct page_info *page)
+{
+    unsigned long x, nx, y = page->u.inuse.type_info;
 
-        /* DOMID_IO reverts to caller for privilege checks. */
-        if ( pg_owner == dom_io )
-            pg_owner = curr->domain;
+    do {
+        x = y;
+        nx = x - (1 | PGT_locked);
+    } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
+}
 
-        if ( !iomem_access_permitted(pg_owner, mfn, mfn) )
-        {
-            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
-            {
-                gdprintk(XENLOG_WARNING,
-                         "d%d non-privileged attempt to map MMIO space 
%"PRI_mfn"\n",
-                         pg_owner->domain_id, mfn);
-                return -EPERM;
-            }
-            return -EINVAL;
-        }
+static int cleanup_page_cacheattr(struct page_info *page)
+{
+    unsigned int cacheattr =
+        (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
 
-        if ( pg_owner != l1e_owner &&
-             !iomem_access_permitted(l1e_owner, mfn, mfn) )
-        {
-            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
-            {
-                gdprintk(XENLOG_WARNING,
-                         "d%d attempted to map MMIO space %"PRI_mfn" in d%d to 
d%d\n",
-                         curr->domain->domain_id, mfn, pg_owner->domain_id,
-                         l1e_owner->domain_id);
-                return -EPERM;
-            }
-            return -EINVAL;
-        }
+    if ( likely(cacheattr == 0) )
+        return 0;
 
-        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-        {
-            /* MMIO pages must not be mapped cachable unless requested so. */
-            switch ( opt_mmio_relax )
-            {
-            case 0:
-                break;
-            case 1:
-                if ( !is_hardware_domain(l1e_owner) )
-                    break;
-                /* fallthrough */
-            case -1:
-                return 0;
-            default:
-                ASSERT_UNREACHABLE();
-            }
-        }
-        else if ( l1f & _PAGE_RW )
-        {
-#ifndef NDEBUG
-            const unsigned long *ro_map;
-            unsigned int seg, bdf;
-
-            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
-                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
-                  test_bit(bdf, ro_map)) )
-                printk(XENLOG_G_WARNING
-                       "d%d: Forcing read-only access to MFN %lx\n",
-                       l1e_owner->domain_id, mfn);
-            else
-                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
-                                       print_mmio_emul_range,
-                                       &(struct mmio_emul_range_ctxt){
-                                           .d = l1e_owner,
-                                           .mfn = mfn });
-#endif
-            flip = _PAGE_RW;
-        }
+    page->count_info &= ~PGC_cacheattr_mask;
 
-        switch ( l1f & PAGE_CACHE_ATTRS )
-        {
-        case 0: /* WB */
-            flip |= _PAGE_PWT | _PAGE_PCD;
-            break;
-        case _PAGE_PWT: /* WT */
-        case _PAGE_PWT | _PAGE_PAT: /* WP */
-            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
-            break;
-        }
+    BUG_ON(is_xen_heap_page(page));
 
-        return flip;
-    }
+    return update_xen_mappings(page_to_mfn(page), 0);
+}
 
-    if ( unlikely( (real_pg_owner != pg_owner) &&
-                   (real_pg_owner != dom_cow) ) )
-    {
-        /*
-         * Let privileged domains transfer the right to map their target
-         * domain's pages. This is used to allow stub-domain pvfb export to
-         * dom0, until pvfb supports granted mappings. At that time this
-         * minor hack can go away.
-         */
-        if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
-             xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n",
-                     pg_owner->domain_id, l1e_owner->domain_id,
-                     real_pg_owner ? real_pg_owner->domain_id : -1);
-            goto could_not_pin;
-        }
-        pg_owner = real_pg_owner;
-    }
+void put_page(struct page_info *page)
+{
+    unsigned long nx, x, y = page->count_info;
 
-    /* Extra paranoid check for shared memory. Writable mappings 
-     * disallowed (unshare first!) */
-    if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) )
-        goto could_not_pin;
-
-    /* Foreign mappings into guests in shadow external mode don't
-     * contribute to writeable mapping refcounts.  (This allows the
-     * qemu-dm helper process in dom0 to map the domain's memory without
-     * messing up the count of "real" writable mappings.) */
-    write = (l1f & _PAGE_RW) &&
-            ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner));
-    if ( write && !get_page_type(page, PGT_writable_page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page type 
PGT_writable_page\n");
-        goto could_not_pin;
+    do {
+        ASSERT((y & PGC_count_mask) != 0);
+        x  = y;
+        nx = x - 1;
     }
+    while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) );
 
-    if ( pte_flags_to_cacheattr(l1f) !=
-         ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) )
+    if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        unsigned long x, nx, y = page->count_info;
-        unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
-        int err;
-
-        if ( is_xen_heap_page(page) )
-        {
-            if ( write )
-                put_page_type(page);
-            put_page(page);
+        if ( cleanup_page_cacheattr(page) == 0 )
+            free_domheap_page(page);
+        else
             gdprintk(XENLOG_WARNING,
-                     "Attempt to change cache attributes of Xen heap page\n");
-            return -EACCES;
-        }
+                     "Leaking mfn %" PRI_pfn "\n", page_to_mfn(page));
+    }
+}
 
-        do {
-            x  = y;
-            nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base);
-        } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
 
-        err = update_xen_mappings(mfn, cacheattr);
-        if ( unlikely(err) )
-        {
-            cacheattr = y & PGC_cacheattr_mask;
-            do {
-                x  = y;
-                nx = (x & ~PGC_cacheattr_mask) | cacheattr;
-            } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
-
-            if ( write )
-                put_page_type(page);
-            put_page(page);
+struct domain *page_get_owner_and_reference(struct page_info *page)
+{
+    unsigned long x, y = page->count_info;
+    struct domain *owner;
 
-            gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" 
PRI_mfn
-                     " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for 
d%d\n",
-                     mfn, get_gpfn_from_mfn(mfn),
-                     l1e_get_intpte(l1e), l1e_owner->domain_id);
-            return err;
-        }
+    do {
+        x = y;
+        /*
+         * Count ==  0: Page is not allocated, so we cannot take a reference.
+         * Count == -1: Reference count would wrap, which is invalid. 
+         * Count == -2: Remaining unused ref is reserved for get_page_light().
+         */
+        if ( unlikely(((x + 2) & PGC_count_mask) <= 2) )
+            return NULL;
     }
+    while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
 
-    return 0;
+    owner = page_get_owner(page);
+    ASSERT(owner);
 
- could_not_pin:
-    gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn
-             ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
-             mfn, get_gpfn_from_mfn(mfn),
-             l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);
-    if ( real_pg_owner != NULL )
-        put_page(page);
-    return -EBUSY;
+    return owner;
 }
 
 
-/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
-define_get_linear_pagetable(l2);
-static int
-get_page_from_l2e(
-    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+int get_page(struct page_info *page, struct domain *domain)
 {
-    unsigned long mfn = l2e_get_pfn(l2e);
-    int rc;
+    struct domain *owner = page_get_owner_and_reference(page);
 
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
+    if ( likely(owner == domain) )
         return 1;
 
-    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
-    {
-        rc = get_page_and_type_from_pagenr(mfn, PGT_l1_page_table, d, 0, 0);
-        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-            rc = 0;
-        return rc;
-    }
+    if ( !paging_mode_refcounts(domain) && !domain->is_dying )
+        gprintk(XENLOG_INFO,
+                "Error pfn %lx: rd=%d od=%d caf=%08lx taf=%" PRtype_info "\n",
+                page_to_mfn(page), domain->domain_id,
+                owner ? owner->domain_id : DOMID_INVALID,
+                page->count_info - !!owner, page->u.inuse.type_info);
 
-    if ( !opt_allow_superpage )
-    {
-        gdprintk(XENLOG_WARNING, "PV superpages disabled in hypervisor\n");
-        return -EINVAL;
-    }
+    if ( owner )
+        put_page(page);
 
-    if ( mfn & (L1_PAGETABLE_ENTRIES-1) )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Unaligned superpage map attempt mfn %" PRI_mfn "\n", mfn);
-        return -EINVAL;
-    }
-
-    return get_superpage(mfn, d);
-}
-
-
-define_get_linear_pagetable(l3);
-static int
-get_page_from_l3e(
-    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                 l3e_get_flags(l3e) & l3_disallow_mask(d));
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_pagenr(
-        l3e_get_pfn(l3e), PGT_l2_page_table, d, partial, 1);
-    if ( unlikely(rc == -EINVAL) &&
-         !is_pv_32bit_domain(d) &&
-         get_l3_linear_pagetable(l3e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
-define_get_linear_pagetable(l4);
-static int
-get_page_from_l4e(
-    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_pagenr(
-        l4e_get_pfn(l4e), PGT_l3_page_table, d, partial, 1);
-    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
-#define adjust_guest_l1e(pl1e, d)                                            \
-    do {                                                                     \
-        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
-             likely(!is_pv_32bit_domain(d)) )                                \
-        {                                                                    \
-            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
-            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
-                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
-                gdprintk(XENLOG_WARNING,                                     \
-                         "Global bit is set to kernel page %lx\n",           \
-                         l1e_get_pfn((pl1e)));                               \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
-                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
-                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
-        }                                                                    \
-    } while ( 0 )
-
-#define adjust_guest_l2e(pl2e, d)                               \
-    do {                                                        \
-        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l2e_add_flags((pl2e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define adjust_guest_l3e(pl3e, d)                                   \
-    do {                                                            \
-        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
-            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
-                                         _PAGE_USER :               \
-                                         _PAGE_USER|_PAGE_RW);      \
-    } while ( 0 )
-
-#define adjust_guest_l4e(pl4e, d)                               \
-    do {                                                        \
-        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l4e_add_flags((pl4e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define unadjust_guest_l3e(pl3e, d)                                         \
-    do {                                                                    \
-        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
-             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
-            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
-    } while ( 0 )
-
-void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
-{
-    unsigned long     pfn = l1e_get_pfn(l1e);
-    struct page_info *page;
-    struct domain    *pg_owner;
-    struct vcpu      *v;
-
-    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) )
-        return;
-
-    page = mfn_to_page(pfn);
-    pg_owner = page_get_owner(page);
-
-    /*
-     * Check if this is a mapping that was established via a grant reference.
-     * If it was then we should not be here: we require that such mappings are
-     * explicitly destroyed via the grant-table interface.
-     * 
-     * The upshot of this is that the guest can end up with active grants that
-     * it cannot destroy (because it no longer has a PTE to present to the
-     * grant-table interface). This can lead to subtle hard-to-catch bugs,
-     * hence a special grant PTE flag can be enabled to catch the bug early.
-     * 
-     * (Note that the undestroyable active grants are not a security hole in
-     * Xen. All active grants can safely be cleaned up when the domain dies.)
-     */
-    if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
-         !l1e_owner->is_shutting_down && !l1e_owner->is_dying )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
-                 l1e_get_intpte(l1e));
-        domain_crash(l1e_owner);
-    }
-
-    /* Remember we didn't take a type-count of foreign writable mappings
-     * to paging-external domains */
-    if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
-         ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
-    {
-        put_page_and_type(page);
-    }
-    else
-    {
-        /* We expect this is rare so we blow the entire shadow LDT. */
-        if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) == 
-                       PGT_seg_desc_page)) &&
-             unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) &&
-             (l1e_owner == pg_owner) )
-        {
-            for_each_vcpu ( pg_owner, v )
-                invalidate_shadow_ldt(v, 1);
-        }
-        put_page(page);
-    }
+    return 0;
 }
 
-
 /*
- * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
- * Note also that this automatically deals correctly with linear p.t.'s.
+ * Special version of get_page() to be used exclusively when
+ * - a page is known to already have a non-zero reference count
+ * - the page does not need its owner to be checked
+ * - it will not be called more than once without dropping the thus
+ *   acquired reference again.
+ * Due to get_page() reserving one reference, this call cannot fail.
  */
-static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
-{
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
-        return 1;
-
-    if ( l2e_get_flags(l2e) & _PAGE_PSE )
-        put_superpage(l2e_get_pfn(l2e));
-    else
-        put_page_and_type(l2e_get_page(l2e));
-
-    return 0;
-}
-
-static int __put_page_type(struct page_info *, int preemptible);
-
-static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
-                             int partial, bool_t defer)
-{
-    struct page_info *pg;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
-        return 1;
-
-    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
-    {
-        unsigned long mfn = l3e_get_pfn(l3e);
-        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
-
-        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
-        do {
-            put_data_page(mfn_to_page(mfn), writeable);
-        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
-
-        return 0;
-    }
-
-    pg = l3e_get_page(l3e);
-
-    if ( unlikely(partial > 0) )
-    {
-        ASSERT(!defer);
-        return __put_page_type(pg, 1);
-    }
-
-    if ( defer )
-    {
-        current->arch.old_guest_table = pg;
-        return 0;
-    }
-
-    return put_page_and_type_preemptible(pg);
-}
-
-static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
-                             int partial, bool_t defer)
-{
-    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) && 
-         (l4e_get_pfn(l4e) != pfn) )
-    {
-        struct page_info *pg = l4e_get_page(l4e);
-
-        if ( unlikely(partial > 0) )
-        {
-            ASSERT(!defer);
-            return __put_page_type(pg, 1);
-        }
-
-        if ( defer )
-        {
-            current->arch.old_guest_table = pg;
-            return 0;
-        }
-
-        return put_page_and_type_preemptible(pg);
-    }
-    return 1;
-}
-
-static int alloc_l1_table(struct page_info *page)
+void get_page_light(struct page_info *page)
 {
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l1_pgentry_t  *pl1e;
-    unsigned int   i;
-    int            ret = 0;
-
-    pl1e = map_domain_page(_mfn(pfn));
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-    {
-        if ( is_guest_l1_slot(i) )
-            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
-            {
-            default:
-                goto fail;
-            case 0:
-                break;
-            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-                l1e_flip_flags(pl1e[i], ret);
-                break;
-            }
+    unsigned long x, nx, y = page->count_info;
 
-        adjust_guest_l1e(pl1e[i], d);
+    do {
+        x  = y;
+        nx = x + 1;
+        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
+        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
+        y = cmpxchg(&page->count_info, x, nx);
     }
-
-    unmap_domain_page(pl1e);
-    return 0;
-
- fail:
-    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
-    while ( i-- > 0 )
-        if ( is_guest_l1_slot(i) )
-            put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-    return ret;
+    while ( unlikely(y != x) );
 }
 
-static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+static int __put_final_page_type(
+    struct page_info *page, unsigned long type, int preemptible)
 {
-    struct page_info *page;
-    l3_pgentry_t     l3e3;
-
-    if ( !is_pv_32bit_domain(d) )
-        return 1;
-
-    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
-
-    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
-    l3e3 = pl3e[3];
-    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
-    {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
-        return 0;
-    }
+    int rc = free_page_type(page, type, preemptible);
 
-    /*
-     * The Xen-private mappings include linear mappings. The L2 thus cannot
-     * be shared by multiple L3 tables. The test here is adequate because:
-     *  1. Cannot appear in slots != 3 because get_page_type() checks the
-     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
-     *  2. Cannot appear in another page table's L3:
-     *     a. alloc_l3_table() calls this function and this check will fail
-     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
-     */
-    page = l3e_get_page(l3e3);
-    BUG_ON(page->u.inuse.type_info & PGT_pinned);
-    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
-    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
-    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    /* No need for atomic update of type_info here: noone else updates it. */
+    if ( rc == 0 )
     {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
-        return 0;
+        /*
+         * Record TLB information for flush later. We do not stamp page tables
+         * when running in shadow mode:
+         *  1. Pointless, since it's the shadow pt's which must be tracked.
+         *  2. Shadow mode reuses this field for shadowed page tables to
+         *     store flags info -- we don't want to conflict with that.
+         */
+        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
+               (page->count_info & PGC_page_table)) )
+            page->tlbflush_timestamp = tlbflush_current_time();
+        wmb();
+        page->u.inuse.type_info--;
     }
-
-    return 1;
-}
-
-static int alloc_l2_table(struct page_info *page, unsigned long type,
-                          int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l2_pgentry_t  *pl2e;
-    unsigned int   i;
-    int            rc = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
+    else if ( rc == -EINTR )
     {
-        if ( preemptible && i > page->nr_validated_ptes
-             && hypercall_preempt_check() )
-        {
-            page->nr_validated_ptes = i;
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( !is_guest_l2_slot(d, type, i) ||
-             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
-            continue;
-
-        if ( rc < 0 )
-        {
-            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", 
i);
-            while ( i-- > 0 )
-                if ( is_guest_l2_slot(d, type, i) )
-                    put_page_from_l2e(pl2e[i], pfn);
-            break;
-        }
-
-        adjust_guest_l2e(pl2e[i], d);
+        ASSERT((page->u.inuse.type_info &
+                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
+        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
+               (page->count_info & PGC_page_table)) )
+            page->tlbflush_timestamp = tlbflush_current_time();
+        wmb();
+        page->u.inuse.type_info |= PGT_validated;
     }
-
-    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
+    else
     {
-        /* Xen private mappings. */
-        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
-               &compat_idle_pg_table_l2[
-                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
-               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
+        BUG_ON(rc != -ERESTART);
+        wmb();
+        get_page_light(page);
+        page->u.inuse.type_info |= PGT_partial;
     }
 
-    unmap_domain_page(pl2e);
-    return rc > 0 ? 0 : rc;
+    return rc;
 }
 
-static int alloc_l3_table(struct page_info *page)
+int __put_page_type(struct page_info *page,
+                    int preemptible)
 {
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l3_pgentry_t  *pl3e;
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    /*
-     * PAE guests allocate full pages, but aren't required to initialize
-     * more than the first four entries; when running in compatibility
-     * mode, however, the full page is visible to the MMU, and hence all
-     * 512 entries must be valid/verified, which is most easily achieved
-     * by clearing them out.
-     */
-    if ( is_pv_32bit_domain(d) )
-        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
+    unsigned long nx, x, y = page->u.inuse.type_info;
+    int rc = 0;
 
-    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
+    for ( ; ; )
     {
-        if ( is_pv_32bit_domain(d) && (i == 3) )
-        {
-            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
-                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
-                rc = -EINVAL;
-            else
-                rc = get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]),
-                                                   PGT_l2_page_table |
-                                                   PGT_pae_xen_l2,
-                                                   d, partial, 1);
-        }
-        else if ( !is_guest_l3_slot(i) ||
-                  (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc == -EINTR && i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            rc = -ERESTART;
-        }
-        if ( rc < 0 )
-            break;
+        x  = y;
+        nx = x - 1;
 
-        adjust_guest_l3e(pl3e[i], d);
-    }
+        ASSERT((x & PGT_count_mask) != 0);
 
-    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
-        rc = -EINVAL;
-    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
-    {
-        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
-        if ( i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            current->arch.old_guest_table = page;
-        }
-        while ( i-- > 0 )
+        if ( unlikely((nx & PGT_count_mask) == 0) )
         {
-            if ( !is_guest_l3_slot(i) )
-                continue;
-            unadjust_guest_l3e(pl3e[i], d);
-        }
-    }
-
-    unmap_domain_page(pl3e);
-    return rc > 0 ? 0 : rc;
-}
-
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
-                         bool_t zap_ro_mpt)
-{
-    /* Xen private mappings. */
-    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
-#ifndef NDEBUG
-    if ( l4e_get_intpte(split_l4e) )
-        l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] =
-            split_l4e;
-#endif
-    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
-        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
-    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
-        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
-    if ( zap_ro_mpt || is_pv_32bit_domain(d) || paging_mode_refcounts(d) )
-        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
-}
-
-bool_t fill_ro_mpt(unsigned long mfn)
-{
-    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
-    bool_t ret = 0;
-
-    if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) )
-    {
-        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
-            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
-        ret = 1;
-    }
-    unmap_domain_page(l4tab);
-
-    return ret;
-}
-
-void zap_ro_mpt(unsigned long mfn)
-{
-    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
-
-    l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
-    unmap_domain_page(l4tab);
-}
-
-static int alloc_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
-    {
-        if ( !is_guest_l4_slot(d, i) ||
-             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc < 0 )
-        {
-            if ( rc != -EINTR )
-                gdprintk(XENLOG_WARNING,
-                         "Failure in alloc_l4_table: slot %#x\n", i);
-            if ( i )
-            {
-                page->nr_validated_ptes = i;
-                page->partial_pte = 0;
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else
-                {
-                    if ( current->arch.old_guest_table )
-                        page->nr_validated_ptes++;
-                    current->arch.old_guest_table = page;
-                }
-            }
-        }
-        if ( rc < 0 )
-        {
-            unmap_domain_page(pl4e);
-            return rc;
-        }
-
-        adjust_guest_l4e(pl4e[i], d);
-    }
-
-    if ( rc >= 0 )
-    {
-        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
-        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-    unmap_domain_page(pl4e);
-
-    return rc;
-}
-
-static void free_l1_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l1_pgentry_t *pl1e;
-    unsigned int  i;
-
-    pl1e = map_domain_page(_mfn(pfn));
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-        if ( is_guest_l1_slot(i) )
-            put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-}
-
-
-static int free_l2_table(struct page_info *page, int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l2_pgentry_t *pl2e;
-    unsigned int  i = page->nr_validated_ptes - 1;
-    int err = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    ASSERT(page->nr_validated_ptes);
-    do {
-        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
-             put_page_from_l2e(pl2e[i], pfn) == 0 &&
-             preemptible && i && hypercall_preempt_check() )
-        {
-           page->nr_validated_ptes = i;
-           err = -ERESTART;
-        }
-    } while ( !err && i-- );
-
-    unmap_domain_page(pl2e);
-
-    if ( !err )
-        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
-
-    return err;
-}
-
-static int free_l3_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l3_pgentry_t *pl3e;
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    do {
-        if ( is_guest_l3_slot(i) )
-        {
-            rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
-            if ( rc < 0 )
-                break;
-            partial = 0;
-            if ( rc > 0 )
-                continue;
-            unadjust_guest_l3e(pl3e[i], d);
-        }
-    } while ( i-- );
-
-    unmap_domain_page(pl3e);
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-    return rc > 0 ? 0 : rc;
-}
-
-static int free_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    do {
-        if ( is_guest_l4_slot(d, i) )
-            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
-        if ( rc < 0 )
-            break;
-        partial = 0;
-    } while ( i-- );
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-
-    unmap_domain_page(pl4e);
-
-    if ( rc >= 0 )
-    {
-        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-
-    return rc;
-}
-
-int page_lock(struct page_info *page)
-{
-    unsigned long x, nx;
-
-    do {
-        while ( (x = page->u.inuse.type_info) & PGT_locked )
-            cpu_relax();
-        nx = x + (1 | PGT_locked);
-        if ( !(x & PGT_validated) ||
-             !(x & PGT_count_mask) ||
-             !(nx & PGT_count_mask) )
-            return 0;
-    } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
-
-    return 1;
-}
-
-void page_unlock(struct page_info *page)
-{
-    unsigned long x, nx, y = page->u.inuse.type_info;
-
-    do {
-        x = y;
-        nx = x - (1 | PGT_locked);
-    } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
-}
-
-/* How to write an entry to the guest pagetables.
- * Returns 0 for failure (pointer not valid), 1 for success. */
-static inline int update_intpte(intpte_t *p, 
-                                intpte_t old, 
-                                intpte_t new,
-                                unsigned long mfn,
-                                struct vcpu *v,
-                                int preserve_ad)
-{
-    int rv = 1;
-#ifndef PTE_UPDATE_WITH_CMPXCHG
-    if ( !preserve_ad )
-    {
-        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
-    }
-    else
-#endif
-    {
-        intpte_t t = old;
-        for ( ; ; )
-        {
-            intpte_t _new = new;
-            if ( preserve_ad )
-                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
-
-            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
-            if ( unlikely(rv == 0) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Failed to update %" PRIpte " -> %" PRIpte
-                         ": saw %" PRIpte "\n", old, _new, t);
-                break;
-            }
-
-            if ( t == old )
-                break;
-
-            /* Allowed to change in Accessed/Dirty flags only. */
-            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
-
-            old = t;
-        }
-    }
-    return rv;
-}
-
-/* Macro that wraps the appropriate type-changes around update_intpte().
- * Arguments are: type, ptr, old, new, mfn, vcpu */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
-    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
-                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v), (_ad))
-
-/*
- * PTE flags that a guest may change without re-validating the PTE.
- * All other bits affect translation, caching, or Xen's safety.
- */
-#define FASTPATH_FLAG_WHITELIST                                     \
-    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
-     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
-
-/* Update the L1 entry at pl1e to new value nl1e. */
-static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
-                        unsigned long gl1mfn, int preserve_ad,
-                        struct vcpu *pt_vcpu, struct domain *pg_dom)
-{
-    l1_pgentry_t ol1e;
-    struct domain *pt_dom = pt_vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
-        return -EFAULT;
-
-    if ( unlikely(paging_mode_refcounts(pt_dom)) )
-    {
-        if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
-            return 0;
-        return -EBUSY;
-    }
-
-    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
-    {
-        /* Translate foreign guest addresses. */
-        struct page_info *page = NULL;
-
-        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
-                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
-            return -EINVAL;
-        }
-
-        if ( paging_mode_translate(pg_dom) )
-        {
-            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, 
P2M_ALLOC);
-            if ( !page )
-                return -EINVAL;
-            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l1e(nl1e, pt_dom);
-            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                              preserve_ad);
-            if ( page )
-                put_page(page);
-            return rc ? 0 : -EBUSY;
-        }
-
-        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
-        {
-        default:
-            if ( page )
-                put_page(page);
-            return rc;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(nl1e, rc);
-            rc = 0;
-            break;
-        }
-        if ( page )
-            put_page(page);
-
-        adjust_guest_l1e(nl1e, pt_dom);
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                    preserve_ad)) )
-        {
-            ol1e = nl1e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l1e(ol1e, pt_dom);
-    return rc;
-}
-
-
-/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
-static int mod_l2_entry(l2_pgentry_t *pl2e, 
-                        l2_pgentry_t nl2e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l2_pgentry_t ol2e;
-    struct domain *d = vcpu->domain;
-    struct page_info *l2pg = mfn_to_page(pfn);
-    unsigned long type = l2pg->u.inuse.type_info;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl2e));
-        return -EPERM;
-    }
-
-    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
-        return -EFAULT;
-
-    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l2e(nl2e, d);
-            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
-                return 0;
-            return -EBUSY;
-        }
-
-        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
-            return rc;
-
-        adjust_guest_l2e(nl2e, d);
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol2e = nl2e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l2e(ol2e, pfn);
-    return rc;
-}
-
-/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
-static int mod_l3_entry(l3_pgentry_t *pl3e, 
-                        l3_pgentry_t nl3e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l3_pgentry_t ol3e;
-    struct domain *d = vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl3e));
-        return -EINVAL;
-    }
-
-    /*
-     * Disallow updates to final L3 slot. It contains Xen mappings, and it
-     * would be a pain to ensure they remain continuously valid throughout.
-     */
-    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
-        return -EINVAL;
-
-    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
-        return -EFAULT;
-
-    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l3e(nl3e, d);
-            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l3e(nl3e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l3e(nl3e, d);
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol3e = nl3e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    if ( likely(rc == 0) )
-        if ( !create_pae_xen_mappings(d, pl3e) )
-            BUG();
-
-    put_page_from_l3e(ol3e, pfn, 0, 1);
-    return rc;
-}
-
-/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
-static int mod_l4_entry(l4_pgentry_t *pl4e, 
-                        l4_pgentry_t nl4e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    struct domain *d = vcpu->domain;
-    l4_pgentry_t ol4e;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl4e));
-        return -EINVAL;
-    }
-
-    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
-        return -EFAULT;
-
-    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l4e(nl4e, d);
-            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l4e(nl4e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l4e(nl4e, d);
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol4e = nl4e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    put_page_from_l4e(ol4e, pfn, 0, 1);
-    return rc;
-}
-
-static int cleanup_page_cacheattr(struct page_info *page)
-{
-    unsigned int cacheattr =
-        (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
-
-    if ( likely(cacheattr == 0) )
-        return 0;
-
-    page->count_info &= ~PGC_cacheattr_mask;
-
-    BUG_ON(is_xen_heap_page(page));
-
-    return update_xen_mappings(page_to_mfn(page), 0);
-}
-
-void put_page(struct page_info *page)
-{
-    unsigned long nx, x, y = page->count_info;
-
-    do {
-        ASSERT((y & PGC_count_mask) != 0);
-        x  = y;
-        nx = x - 1;
-    }
-    while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) );
-
-    if ( unlikely((nx & PGC_count_mask) == 0) )
-    {
-        if ( cleanup_page_cacheattr(page) == 0 )
-            free_domheap_page(page);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Leaking mfn %" PRI_pfn "\n", page_to_mfn(page));
-    }
-}
-
-
-struct domain *page_get_owner_and_reference(struct page_info *page)
-{
-    unsigned long x, y = page->count_info;
-    struct domain *owner;
-
-    do {
-        x = y;
-        /*
-         * Count ==  0: Page is not allocated, so we cannot take a reference.
-         * Count == -1: Reference count would wrap, which is invalid. 
-         * Count == -2: Remaining unused ref is reserved for get_page_light().
-         */
-        if ( unlikely(((x + 2) & PGC_count_mask) <= 2) )
-            return NULL;
-    }
-    while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
-
-    owner = page_get_owner(page);
-    ASSERT(owner);
-
-    return owner;
-}
-
-
-int get_page(struct page_info *page, struct domain *domain)
-{
-    struct domain *owner = page_get_owner_and_reference(page);
-
-    if ( likely(owner == domain) )
-        return 1;
-
-    if ( !paging_mode_refcounts(domain) && !domain->is_dying )
-        gprintk(XENLOG_INFO,
-                "Error pfn %lx: rd=%d od=%d caf=%08lx taf=%" PRtype_info "\n",
-                page_to_mfn(page), domain->domain_id,
-                owner ? owner->domain_id : DOMID_INVALID,
-                page->count_info - !!owner, page->u.inuse.type_info);
-
-    if ( owner )
-        put_page(page);
-
-    return 0;
-}
-
-/*
- * Special version of get_page() to be used exclusively when
- * - a page is known to already have a non-zero reference count
- * - the page does not need its owner to be checked
- * - it will not be called more than once without dropping the thus
- *   acquired reference again.
- * Due to get_page() reserving one reference, this call cannot fail.
- */
-static void get_page_light(struct page_info *page)
-{
-    unsigned long x, nx, y = page->count_info;
-
-    do {
-        x  = y;
-        nx = x + 1;
-        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
-        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
-        y = cmpxchg(&page->count_info, x, nx);
-    }
-    while ( unlikely(y != x) );
-}
-
-static int alloc_page_type(struct page_info *page, unsigned long type,
-                           int preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    int rc;
-
-    /* A page table is dirtied when its type count becomes non-zero. */
-    if ( likely(owner != NULL) )
-        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        rc = alloc_l1_table(page);
-        break;
-    case PGT_l2_page_table:
-        rc = alloc_l2_table(page, type, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l4_table(page);
-        break;
-    case PGT_seg_desc_page:
-        rc = alloc_segdesc_page(page);
-        break;
-    default:
-        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n", 
-               type, page->u.inuse.type_info,
-               page->count_info);
-        rc = -EINVAL;
-        BUG();
-    }
-
-    /* No need for atomic update of type_info here: noone else updates it. */
-    wmb();
-    switch ( rc )
-    {
-    case 0:
-        page->u.inuse.type_info |= PGT_validated;
-        break;
-    case -EINTR:
-        ASSERT((page->u.inuse.type_info &
-                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
-        page->u.inuse.type_info &= ~PGT_count_mask;
-        break;
-    default:
-        ASSERT(rc < 0);
-        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
-                 " (pfn %" PRI_pfn ") for type %" PRtype_info
-                 ": caf=%08lx taf=%" PRtype_info "\n",
-                 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
-                 type, page->count_info, page->u.inuse.type_info);
-        if ( page != current->arch.old_guest_table )
-            page->u.inuse.type_info = 0;
-        else
-        {
-            ASSERT((page->u.inuse.type_info &
-                    (PGT_count_mask | PGT_validated)) == 1);
-    case -ERESTART:
-            get_page_light(page);
-            page->u.inuse.type_info |= PGT_partial;
-        }
-        break;
-    }
-
-    return rc;
-}
-
-
-int free_page_type(struct page_info *page, unsigned long type,
-                   int preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    unsigned long gmfn;
-    int rc;
-
-    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
-    {
-        /* A page table is dirtied when its type count becomes zero. */
-        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
-
-        if ( shadow_mode_refcounts(owner) )
-            return 0;
-
-        gmfn = mfn_to_gmfn(owner, page_to_mfn(page));
-        ASSERT(VALID_M2P(gmfn));
-        /* Page sharing not supported for shadowed domains */
-        if(!SHARED_M2P(gmfn))
-            shadow_remove_all_shadows(owner, _mfn(gmfn));
-    }
-
-    if ( !(type & PGT_partial) )
-    {
-        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
-        page->partial_pte = 0;
-    }
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        free_l1_table(page);
-        rc = 0;
-        break;
-    case PGT_l2_page_table:
-        rc = free_l2_table(page, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = free_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = free_l4_table(page);
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
-                 type, page_to_mfn(page));
-        rc = -EINVAL;
-        BUG();
-    }
-
-    return rc;
-}
-
-
-static int __put_final_page_type(
-    struct page_info *page, unsigned long type, int preemptible)
-{
-    int rc = free_page_type(page, type, preemptible);
-
-    /* No need for atomic update of type_info here: noone else updates it. */
-    if ( rc == 0 )
-    {
-        /*
-         * Record TLB information for flush later. We do not stamp page tables
-         * when running in shadow mode:
-         *  1. Pointless, since it's the shadow pt's which must be tracked.
-         *  2. Shadow mode reuses this field for shadowed page tables to
-         *     store flags info -- we don't want to conflict with that.
-         */
-        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
-               (page->count_info & PGC_page_table)) )
-            page->tlbflush_timestamp = tlbflush_current_time();
-        wmb();
-        page->u.inuse.type_info--;
-    }
-    else if ( rc == -EINTR )
-    {
-        ASSERT((page->u.inuse.type_info &
-                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
-        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
-               (page->count_info & PGC_page_table)) )
-            page->tlbflush_timestamp = tlbflush_current_time();
-        wmb();
-        page->u.inuse.type_info |= PGT_validated;
-    }
-    else
-    {
-        BUG_ON(rc != -ERESTART);
-        wmb();
-        get_page_light(page);
-        page->u.inuse.type_info |= PGT_partial;
-    }
-
-    return rc;
-}
-
-
-static int __put_page_type(struct page_info *page,
-                           int preemptible)
-{
-    unsigned long nx, x, y = page->u.inuse.type_info;
-    int rc = 0;
-
-    for ( ; ; )
-    {
-        x  = y;
-        nx = x - 1;
-
-        ASSERT((x & PGT_count_mask) != 0);
-
-        if ( unlikely((nx & PGT_count_mask) == 0) )
-        {
-            if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) &&
-                 likely(nx & (PGT_validated|PGT_partial)) )
-            {
-                /*
-                 * Page-table pages must be unvalidated when count is zero. The
-                 * 'free' is safe because the refcnt is non-zero and validated
-                 * bit is clear => other ops will spin or fail.
-                 */
-                nx = x & ~(PGT_validated|PGT_partial);
-                if ( unlikely((y = cmpxchg(&page->u.inuse.type_info,
-                                           x, nx)) != x) )
-                    continue;
-                /* We cleared the 'valid bit' so we do the clean up. */
-                rc = __put_final_page_type(page, x, preemptible);
-                if ( x & PGT_partial )
-                    put_page(page);
-                break;
-            }
-
-            /*
-             * Record TLB information for flush later. We do not stamp page
-             * tables when running in shadow mode:
-             *  1. Pointless, since it's the shadow pt's which must be tracked.
-             *  2. Shadow mode reuses this field for shadowed page tables to
-             *     store flags info -- we don't want to conflict with that.
-             */
-            if ( !(shadow_mode_enabled(page_get_owner(page)) &&
-                   (page->count_info & PGC_page_table)) )
-                page->tlbflush_timestamp = tlbflush_current_time();
-        }
-
-        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
-            break;
-
-        if ( preemptible && hypercall_preempt_check() )
-            return -EINTR;
-    }
-
-    return rc;
-}
-
-
-static int __get_page_type(struct page_info *page, unsigned long type,
-                           int preemptible)
-{
-    unsigned long nx, x, y = page->u.inuse.type_info;
-    int rc = 0, iommu_ret = 0;
-
-    ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
-    ASSERT(!in_irq());
-
-    for ( ; ; )
-    {
-        x  = y;
-        nx = x + 1;
-        if ( unlikely((nx & PGT_count_mask) == 0) )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Type count overflow on mfn %"PRI_mfn"\n",
-                     page_to_mfn(page));
-            return -EINVAL;
-        }
-        else if ( unlikely((x & PGT_count_mask) == 0) )
-        {
-            struct domain *d = page_get_owner(page);
-
-            /* Normally we should never let a page go from type count 0
-             * to type count 1 when it is shadowed. One exception:
-             * out-of-sync shadowed pages are allowed to become
-             * writeable. */
-            if ( d && shadow_mode_enabled(d)
-                 && (page->count_info & PGC_page_table)
-                 && !((page->shadow_flags & (1u<<29))
-                      && type == PGT_writable_page) )
-               shadow_remove_all_shadows(d, _mfn(page_to_mfn(page)));
-
-            ASSERT(!(x & PGT_pae_xen_l2));
-            if ( (x & PGT_type_mask) != type )
-            {
-                /*
-                 * On type change we check to flush stale TLB entries. This 
-                 * may be unnecessary (e.g., page was GDT/LDT) but those 
-                 * circumstances should be very rare.
-                 */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                BUG_ON(in_irq());
-                cpumask_copy(mask, d->domain_dirty_cpumask);
-
-                /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(mask, page->tlbflush_timestamp);
-
-                if ( unlikely(!cpumask_empty(mask)) &&
-                     /* Shadow mode: track only writable pages. */
-                     (!shadow_mode_enabled(page_get_owner(page)) ||
-                      ((nx & PGT_type_mask) == PGT_writable_page)) )
-                {
-                    perfc_incr(need_flush_tlb_flush);
-                    flush_tlb_mask(mask);
-                }
-
-                /* We lose existing type and validity. */
-                nx &= ~(PGT_type_mask | PGT_validated);
-                nx |= type;
-
-                /* No special validation needed for writable pages. */
-                /* Page tables and GDT/LDT need to be scanned for validity. */
-                if ( type == PGT_writable_page || type == PGT_shared_page )
-                    nx |= PGT_validated;
-            }
-        }
-        else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
-        {
-            /* Don't log failure if it could be a recursive-mapping attempt. */
-            if ( ((x & PGT_type_mask) == PGT_l2_page_table) &&
-                 (type == PGT_l1_page_table) )
-                return -EINVAL;
-            if ( ((x & PGT_type_mask) == PGT_l3_page_table) &&
-                 (type == PGT_l2_page_table) )
-                return -EINVAL;
-            if ( ((x & PGT_type_mask) == PGT_l4_page_table) &&
-                 (type == PGT_l3_page_table) )
-                return -EINVAL;
-            gdprintk(XENLOG_WARNING,
-                     "Bad type (saw %" PRtype_info " != exp %" PRtype_info ") "
-                     "for mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n",
-                     x, type, page_to_mfn(page),
-                     get_gpfn_from_mfn(page_to_mfn(page)));
-            return -EINVAL;
-        }
-        else if ( unlikely(!(x & PGT_validated)) )
-        {
-            if ( !(x & PGT_partial) )
-            {
-                /* Someone else is updating validation of this page. Wait... */
-                while ( (y = page->u.inuse.type_info) == x )
-                {
-                    if ( preemptible && hypercall_preempt_check() )
-                        return -EINTR;
-                    cpu_relax();
-                }
-                continue;
-            }
-            /* Type ref count was left at 1 when PGT_partial got set. */
-            ASSERT((x & PGT_count_mask) == 1);
-            nx = x & ~PGT_partial;
-        }
-
-        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
-            break;
-
-        if ( preemptible && hypercall_preempt_check() )
-            return -EINTR;
-    }
-
-    if ( unlikely((x & PGT_type_mask) != type) )
-    {
-        /* Special pages should not be accessible from devices. */
-        struct domain *d = page_get_owner(page);
-        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
-        {
-            if ( (x & PGT_type_mask) == PGT_writable_page )
-                iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, 
page_to_mfn(page)));
-            else if ( type == PGT_writable_page )
-                iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, 
page_to_mfn(page)),
-                                           page_to_mfn(page),
-                                           IOMMUF_readable|IOMMUF_writable);
-        }
-    }
-
-    if ( unlikely(!(nx & PGT_validated)) )
-    {
-        if ( !(x & PGT_partial) )
-        {
-            page->nr_validated_ptes = 0;
-            page->partial_pte = 0;
-        }
-        rc = alloc_page_type(page, type, preemptible);
-    }
-
-    if ( (x & PGT_partial) && !(nx & PGT_partial) )
-        put_page(page);
-
-    if ( !rc )
-        rc = iommu_ret;
-
-    return rc;
-}
-
-void put_page_type(struct page_info *page)
-{
-    int rc = __put_page_type(page, 0);
-    ASSERT(rc == 0);
-    (void)rc;
-}
-
-int get_page_type(struct page_info *page, unsigned long type)
-{
-    int rc = __get_page_type(page, type, 0);
-    if ( likely(rc == 0) )
-        return 1;
-    ASSERT(rc != -EINTR && rc != -ERESTART);
-    return 0;
-}
-
-int put_page_type_preemptible(struct page_info *page)
-{
-    return __put_page_type(page, 1);
-}
-
-int get_page_type_preemptible(struct page_info *page, unsigned long type)
-{
-    ASSERT(!current->arch.old_guest_table);
-    return __get_page_type(page, type, 1);
-}
-
-static int get_spage_pages(struct page_info *page, struct domain *d)
-{
-    int i;
-
-    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
-    {
-        if (!get_page_and_type(page, d, PGT_writable_page))
-        {
-            while (--i >= 0)
-                put_page_and_type(--page);
-            return 0;
-        }
-    }
-    return 1;
-}
-
-static void put_spage_pages(struct page_info *page)
-{
-    int i;
-
-    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
-    {
-        put_page_and_type(page);
-    }
-    return;
-}
-
-static int mark_superpage(struct spage_info *spage, struct domain *d)
-{
-    unsigned long x, nx, y = spage->type_info;
-    int pages_done = 0;
-
-    ASSERT(opt_allow_superpage);
-
-    do {
-        x = y;
-        nx = x + 1;
-        if ( (x & SGT_type_mask) == SGT_mark )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Duplicate superpage mark attempt mfn %" PRI_mfn "\n",
-                     spage_to_mfn(spage));
-            if ( pages_done )
-                put_spage_pages(spage_to_page(spage));
-            return -EINVAL;
-        }
-        if ( (x & SGT_type_mask) == SGT_dynamic )
-        {
-            if ( pages_done )
-            {
-                put_spage_pages(spage_to_page(spage));
-                pages_done = 0;
-            }
-        }
-        else if ( !pages_done )
-        {
-            if ( !get_spage_pages(spage_to_page(spage), d) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Superpage type conflict in mark attempt mfn %" 
PRI_mfn "\n",
-                         spage_to_mfn(spage));
-                return -EINVAL;
-            }
-            pages_done = 1;
-        }
-        nx = (nx & ~SGT_type_mask) | SGT_mark;
-
-    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
-
-    return 0;
-}
-
-static int unmark_superpage(struct spage_info *spage)
-{
-    unsigned long x, nx, y = spage->type_info;
-    unsigned long do_pages = 0;
-
-    ASSERT(opt_allow_superpage);
-
-    do {
-        x = y;
-        nx = x - 1;
-        if ( (x & SGT_type_mask) != SGT_mark )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Attempt to unmark unmarked superpage mfn %" PRI_mfn "\n",
-                     spage_to_mfn(spage));
-            return -EINVAL;
-        }
-        if ( (nx & SGT_count_mask) == 0 )
-        {
-            nx = (nx & ~SGT_type_mask) | SGT_none;
-            do_pages = 1;
-        }
-        else
-        {
-            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
-        }
-    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
-
-    if ( do_pages )
-        put_spage_pages(spage_to_page(spage));
-
-    return 0;
-}
-
-void clear_superpage_mark(struct page_info *page)
-{
-    struct spage_info *spage;
-
-    if ( !opt_allow_superpage )
-        return;
-
-    spage = page_to_spage(page);
-    if ((spage->type_info & SGT_type_mask) == SGT_mark)
-        unmark_superpage(spage);
-
-}
-
-int get_superpage(unsigned long mfn, struct domain *d)
-{
-    struct spage_info *spage;
-    unsigned long x, nx, y;
-    int pages_done = 0;
-
-    ASSERT(opt_allow_superpage);
-
-    if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
-        return -EINVAL;
-
-    spage = mfn_to_spage(mfn);
-    y = spage->type_info;
-    do {
-        x = y;
-        nx = x + 1;
-        if ( (x & SGT_type_mask) != SGT_none )
-        {
-            if ( pages_done )
-            {
-                put_spage_pages(spage_to_page(spage));
-                pages_done = 0;
-            }
-        }
-        else
-        {
-            if ( !get_spage_pages(spage_to_page(spage), d) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Type conflict on superpage mapping mfn %" PRI_mfn 
"\n",
-                         spage_to_mfn(spage));
-                return -EINVAL;
-            }
-            pages_done = 1;
-            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
-        }
-    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
-
-    return 0;
-}
-
-static void put_superpage(unsigned long mfn)
-{
-    struct spage_info *spage;
-    unsigned long x, nx, y;
-    unsigned long do_pages = 0;
-
-    if ( !opt_allow_superpage )
-    {
-        put_spage_pages(mfn_to_page(mfn));
-        return;
-    }
-
-    spage = mfn_to_spage(mfn);
-    y = spage->type_info;
-    do {
-        x = y;
-        nx = x - 1;
-        if ((x & SGT_type_mask) == SGT_dynamic)
-        {
-            if ((nx & SGT_count_mask) == 0)
-            {
-                nx = (nx & ~SGT_type_mask) | SGT_none;
-                do_pages = 1;
-            }
-        }
-
-    } while ((y = cmpxchg(&spage->type_info, x, nx)) != x);
-
-    if (do_pages)
-        put_spage_pages(spage_to_page(spage));
-
-    return;
-}
-
-int put_old_guest_table(struct vcpu *v)
-{
-    int rc;
-
-    if ( !v->arch.old_guest_table )
-        return 0;
-
-    switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) )
-    {
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    }
-
-    v->arch.old_guest_table = NULL;
-
-    return rc;
-}
-
-int vcpu_destroy_pagetables(struct vcpu *v)
-{
-    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
-    struct page_info *page;
-    l4_pgentry_t *l4tab = NULL;
-    int rc = put_old_guest_table(v);
-
-    if ( rc )
-        return rc;
-
-    if ( is_pv_32bit_vcpu(v) )
-    {
-        l4tab = map_domain_page(_mfn(mfn));
-        mfn = l4e_get_pfn(*l4tab);
-    }
-
-    if ( mfn )
-    {
-        page = mfn_to_page(mfn);
-        if ( paging_mode_refcounts(v->domain) )
-            put_page(page);
-        else
-            rc = put_page_and_type_preemptible(page);
-    }
-
-    if ( l4tab )
-    {
-        if ( !rc )
-            l4e_write(l4tab, l4e_empty());
-        unmap_domain_page(l4tab);
-    }
-    else if ( !rc )
-    {
-        v->arch.guest_table = pagetable_null();
-
-        /* Drop ref to guest_table_user (from MMUEXT_NEW_USER_BASEPTR) */
-        mfn = pagetable_get_pfn(v->arch.guest_table_user);
-        if ( mfn )
-        {
-            page = mfn_to_page(mfn);
-            if ( paging_mode_refcounts(v->domain) )
-                put_page(page);
-            else
-                rc = put_page_and_type_preemptible(page);
-        }
-        if ( !rc )
-            v->arch.guest_table_user = pagetable_null();
-    }
-
-    v->arch.cr3 = 0;
-
-    /*
-     * put_page_and_type_preemptible() is liable to return -EINTR. The
-     * callers of us expect -ERESTART so convert it over.
-     */
-    return rc != -EINTR ? rc : -ERESTART;
-}
-
-int new_guest_cr3(unsigned long mfn)
-{
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    int rc;
-    unsigned long old_base_mfn;
-
-    if ( is_pv_32bit_domain(d) )
-    {
-        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
-        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
-
-        rc = paging_mode_refcounts(d)
-             ? -EINVAL /* Old code was broken, but what should it be? */
-             : mod_l4_entry(
-                    pl4e,
-                    l4e_from_pfn(
-                        mfn,
-                        (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
-                    gt_mfn, 0, curr);
-        unmap_domain_page(pl4e);
-        switch ( rc )
-        {
-        case 0:
-            break;
-        case -EINTR:
-        case -ERESTART:
-            return -ERESTART;
-        default:
-            gdprintk(XENLOG_WARNING,
-                     "Error while installing new compat baseptr %" PRI_mfn 
"\n",
-                     mfn);
-            return rc;
-        }
-
-        invalidate_shadow_ldt(curr, 0);
-        write_ptbase(curr);
-
-        return 0;
-    }
-
-    rc = put_old_guest_table(curr);
-    if ( unlikely(rc) )
-        return rc;
-
-    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
-    /*
-     * This is particularly important when getting restarted after the
-     * previous attempt got preempted in the put-old-MFN phase.
-     */
-    if ( old_base_mfn == mfn )
-    {
-        write_ptbase(curr);
-        return 0;
-    }
-
-    rc = paging_mode_refcounts(d)
-         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
-         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
-    switch ( rc )
-    {
-    case 0:
-        break;
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    default:
-        gdprintk(XENLOG_WARNING,
-                 "Error while installing new baseptr %" PRI_mfn "\n", mfn);
-        return rc;
-    }
-
-    invalidate_shadow_ldt(curr, 0);
-
-    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
-        fill_ro_mpt(mfn);
-    curr->arch.guest_table = pagetable_from_pfn(mfn);
-    update_cr3(curr);
-
-    write_ptbase(curr);
-
-    if ( likely(old_base_mfn != 0) )
-    {
-        struct page_info *page = mfn_to_page(old_base_mfn);
-
-        if ( paging_mode_refcounts(d) )
-            put_page(page);
-        else
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-                rc = -ERESTART;
-                /* fallthrough */
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-    }
-
-    return rc;
-}
-
-static struct domain *get_pg_owner(domid_t domid)
-{
-    struct domain *pg_owner = NULL, *curr = current->domain;
-
-    if ( likely(domid == DOMID_SELF) )
-    {
-        pg_owner = rcu_lock_current_domain();
-        goto out;
-    }
-
-    if ( unlikely(domid == curr->domain_id) )
-    {
-        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
-        goto out;
-    }
-
-    if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Cannot mix foreign mappings with translated domains\n");
-        goto out;
-    }
-
-    switch ( domid )
-    {
-    case DOMID_IO:
-        pg_owner = rcu_lock_domain(dom_io);
-        break;
-    case DOMID_XEN:
-        pg_owner = rcu_lock_domain(dom_xen);
-        break;
-    default:
-        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
-        {
-            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
-            break;
-        }
-        break;
-    }
-
- out:
-    return pg_owner;
-}
-
-static void put_pg_owner(struct domain *pg_owner)
-{
-    rcu_unlock_domain(pg_owner);
-}
-
-static inline int vcpumask_to_pcpumask(
-    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t 
*pmask)
-{
-    unsigned int vcpu_id, vcpu_bias, offs;
-    unsigned long vmask;
-    struct vcpu *v;
-    bool_t is_native = !is_pv_32bit_domain(d);
-
-    cpumask_clear(pmask);
-    for ( vmask = 0, offs = 0; ; ++offs)
-    {
-        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
-        if ( vcpu_bias >= d->max_vcpus )
-            return 0;
-
-        if ( unlikely(is_native ?
-                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
-                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
-                                             offs, 1)) )
-        {
-            cpumask_clear(pmask);
-            return -EFAULT;
-        }
-
-        while ( vmask )
-        {
-            vcpu_id = find_first_set_bit(vmask);
-            vmask &= ~(1UL << vcpu_id);
-            vcpu_id += vcpu_bias;
-            if ( (vcpu_id >= d->max_vcpus) )
-                return 0;
-            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
-                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
-        }
-    }
-}
-
-long do_mmuext_op(
-    XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmuext_op op;
-    unsigned long type;
-    unsigned int i, done = 0;
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    struct domain *pg_owner;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(uops)) )
-    {
-        /* See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below. */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmuext_op);
-
-    if ( unlikely(!guest_handle_okay(uops, count)) )
-        return -EFAULT;
-
-    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
-        return -ESRCH;
-
-    if ( !is_pv_domain(pg_owner) )
-    {
-        put_pg_owner(pg_owner);
-        return -EINVAL;
-    }
-
-    rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner);
-    if ( rc )
-    {
-        put_pg_owner(pg_owner);
-        return rc;
-    }
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( is_hvm_domain(d) )
-        {
-            switch ( op.cmd )
-            {
-            case MMUEXT_PIN_L1_TABLE:
-            case MMUEXT_PIN_L2_TABLE:
-            case MMUEXT_PIN_L3_TABLE:
-            case MMUEXT_PIN_L4_TABLE:
-            case MMUEXT_UNPIN_TABLE:
-                break;
-            default:
-                rc = -EOPNOTSUPP;
-                goto done;
-            }
-        }
-
-        rc = 0;
-
-        switch ( op.cmd )
-        {
-        case MMUEXT_PIN_L1_TABLE:
-            type = PGT_l1_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L2_TABLE:
-            type = PGT_l2_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L3_TABLE:
-            type = PGT_l3_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L4_TABLE:
-            if ( is_pv_32bit_domain(pg_owner) )
-                break;
-            type = PGT_l4_page_table;
-
-        pin_page: {
-            struct page_info *page;
-
-            /* Ignore pinning of invalid paging levels. */
-            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
-                break;
-
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            rc = get_page_type_preemptible(page, type);
-            if ( unlikely(rc) )
-            {
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else if ( rc != -ERESTART )
-                    gdprintk(XENLOG_WARNING,
-                             "Error %d while pinning mfn %" PRI_mfn "\n",
-                            rc, page_to_mfn(page));
-                if ( page != curr->arch.old_guest_table )
-                    put_page(page);
-                break;
-            }
-
-            rc = xsm_memory_pin_page(XSM_HOOK, d, pg_owner, page);
-            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
-                                                  &page->u.inuse.type_info)) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " already pinned\n", 
page_to_mfn(page));
-                rc = -EINVAL;
-            }
-
-            if ( unlikely(rc) )
-                goto pin_drop;
-
-            /* A page is dirtied when its pin status is set. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            /* We can race domain destruction (domain_relinquish_resources). */
-            if ( unlikely(pg_owner != d) )
-            {
-                int drop_ref;
-                spin_lock(&pg_owner->page_alloc_lock);
-                drop_ref = (pg_owner->is_dying &&
-                            test_and_clear_bit(_PGT_pinned,
-                                               &page->u.inuse.type_info));
-                spin_unlock(&pg_owner->page_alloc_lock);
-                if ( drop_ref )
-                {
-        pin_drop:
-                    if ( type == PGT_l1_page_table )
-                        put_page_and_type(page);
-                    else
-                        curr->arch.old_guest_table = page;
-                }
-            }
-
-            break;
-        }
-
-        case MMUEXT_UNPIN_TABLE: {
-            struct page_info *page;
-
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
-                         op.arg1.mfn, pg_owner->domain_id);
-                rc = -EINVAL;
-                break;
-            }
-
-            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-            {
-                put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                rc = 0;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-            put_page(page);
-
-            /* A page is dirtied when its pin status is cleared. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            break;
-        }
-
-        case MMUEXT_NEW_BASEPTR:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(d)) )
-                rc = -EINVAL;
-            else
-                rc = new_guest_cr3(op.arg1.mfn);
-            break;
-
-        case MMUEXT_NEW_USER_BASEPTR: {
-            unsigned long old_mfn;
-
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(d)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
-            /*
-             * This is particularly important when getting restarted after the
-             * previous attempt got preempted in the put-old-MFN phase.
-             */
-            if ( old_mfn == op.arg1.mfn )
-                break;
-
-            if ( op.arg1.mfn != 0 )
-            {
-                if ( paging_mode_refcounts(d) )
-                    rc = get_page_from_pagenr(op.arg1.mfn, d) ? 0 : -EINVAL;
-                else
-                    rc = get_page_and_type_from_pagenr(
-                        op.arg1.mfn, PGT_root_page_table, d, 0, 1);
-
-                if ( unlikely(rc) )
-                {
-                    if ( rc == -EINTR )
-                        rc = -ERESTART;
-                    else if ( rc != -ERESTART )
-                        gdprintk(XENLOG_WARNING,
-                                 "Error %d installing new mfn %" PRI_mfn "\n",
-                                 rc, op.arg1.mfn);
-                    break;
-                }
-                if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
-                    zap_ro_mpt(op.arg1.mfn);
-            }
-
-            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
-
-            if ( old_mfn != 0 )
-            {
-                struct page_info *page = mfn_to_page(old_mfn);
-
-                if ( paging_mode_refcounts(d) )
-                    put_page(page);
-                else
-                    switch ( rc = put_page_and_type_preemptible(page) )
-                    {
-                    case -EINTR:
-                        rc = -ERESTART;
-                        /* fallthrough */
-                    case -ERESTART:
-                        curr->arch.old_guest_table = page;
-                        break;
-                    default:
-                        BUG_ON(rc);
-                        break;
-                    }
-            }
-
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_LOCAL:
-            if ( likely(d == pg_owner) )
-                flush_tlb_local();
-            else
-                rc = -EPERM;
-            break;
-
-        case MMUEXT_INVLPG_LOCAL:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else
-                paging_invlpg(curr, op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_TLB_FLUSH_MULTI:
-        case MMUEXT_INVLPG_MULTI:
-        {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
-
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(vcpumask_to_pcpumask(d,
-                                   guest_handle_to_param(op.arg2.vcpumask,
-                                                         const_void),
-                                   mask)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
-                flush_tlb_mask(mask);
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(mask, op.arg1.linear_addr);
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_ALL:
-            if ( likely(d == pg_owner) )
-                flush_tlb_mask(d->domain_dirty_cpumask);
-            else
-                rc = -EPERM;
-            break;
-    
-        case MMUEXT_INVLPG_ALL:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(d->domain_dirty_cpumask, 
op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_FLUSH_CACHE:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(!cache_flush_permitted(d)) )
-                rc = -EACCES;
-            else
-                wbinvd();
-            break;
-
-        case MMUEXT_FLUSH_CACHE_GLOBAL:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( likely(cache_flush_permitted(d)) )
-            {
-                unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                cpumask_clear(mask);
-                for_each_online_cpu(cpu)
-                    if ( !cpumask_intersects(mask,
-                                             per_cpu(cpu_sibling_mask, cpu)) )
-                        __cpumask_set_cpu(cpu, mask);
-                flush_mask(mask, FLUSH_CACHE);
-            }
-            else
-                rc = -EINVAL;
-            break;
-
-        case MMUEXT_SET_LDT:
-        {
-            unsigned int ents = op.arg2.nr_ents;
-            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
-
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( paging_mode_external(d) )
-                rc = -EINVAL;
-            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
-                      (ents > 8192) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
-                rc = -EINVAL;
-            }
-            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
-                      (curr->arch.pv_vcpu.ldt_base != ptr) )
-            {
-                invalidate_shadow_ldt(curr, 0);
-                flush_tlb_local();
-                curr->arch.pv_vcpu.ldt_base = ptr;
-                curr->arch.pv_vcpu.ldt_ents = ents;
-                load_LDT(curr);
-            }
-            break;
-        }
-
-        case MMUEXT_CLEAR_PAGE: {
-            struct page_info *page;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( !page || !get_page_type(page, PGT_writable_page) )
-            {
-                if ( page )
-                    put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            /* A page is dirtied when it's being cleared. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            clear_domain_page(_mfn(page_to_mfn(page)));
-
-            put_page_and_type(page);
-            break;
-        }
-
-        case MMUEXT_COPY_PAGE:
-        {
-            struct page_info *src_page, *dst_page;
-
-            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, NULL,
-                                         P2M_ALLOC);
-            if ( unlikely(!src_page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Error copying from mfn %" PRI_mfn "\n",
-                         op.arg2.src_mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL,
-                                         P2M_ALLOC);
-            rc = (dst_page &&
-                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
-            if ( unlikely(rc) )
-            {
-                put_page(src_page);
-                if ( dst_page )
-                    put_page(dst_page);
-                gdprintk(XENLOG_WARNING,
-                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
-                break;
-            }
-
-            /* A page is dirtied when it's being copied to. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page)));
-
-            copy_domain_page(_mfn(page_to_mfn(dst_page)),
-                             _mfn(page_to_mfn(src_page)));
-
-            put_page_and_type(dst_page);
-            put_page(src_page);
-            break;
-        }
-
-        case MMUEXT_MARK_SUPER:
-        case MMUEXT_UNMARK_SUPER:
-        {
-            unsigned long mfn = op.arg1.mfn;
-
-            if ( !opt_allow_superpage )
-                rc = -EOPNOTSUPP;
-            else if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( mfn & (L1_PAGETABLE_ENTRIES - 1) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Unaligned superpage mfn %" PRI_mfn "\n", mfn);
-                rc = -EINVAL;
-            }
-            else if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
-                rc = -EINVAL;
-            else if ( op.cmd == MMUEXT_MARK_SUPER )
-                rc = mark_superpage(mfn_to_spage(mfn), d);
-            else
-                rc = unmark_superpage(mfn_to_spage(mfn));
-            break;
-        }
-
-        default:
-            rc = -ENOSYS;
-            break;
-        }
-
- done:
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(uops, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmuext_op, "hihi",
-            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmuext_op, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    perfc_add(num_mmuext_ops, i);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-long do_mmu_update(
-    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmu_update req;
-    void *va;
-    unsigned long gpfn, gmfn, mfn;
-    struct page_info *page;
-    unsigned int cmd, i = 0, done = 0, pt_dom;
-    struct vcpu *curr = current, *v = curr;
-    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
-    struct domain_mmap_cache mapcache;
-    uint32_t xsm_needed = 0;
-    uint32_t xsm_checked = 0;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(ureqs)) )
-    {
-        /* See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below. */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmu_update);
-
-    if ( unlikely(!guest_handle_okay(ureqs, count)) )
-        return -EFAULT;
-
-    if ( (pt_dom = foreigndom >> 16) != 0 )
-    {
-        /* Pagetables belong to a foreign domain (PFD). */
-        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
-            return -ESRCH;
-
-        if ( pt_owner == d )
-            rcu_unlock_domain(pt_owner);
-        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
-        {
-            rc = -EINVAL;
-            goto out;
-        }
-    }
-
-    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
-    {
-        rc = -ESRCH;
-        goto out;
-    }
-
-    domain_mmap_cache_init(&mapcache);
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
-
-        switch ( cmd )
-        {
-            /*
-             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
-             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
-             * current A/D bits.
-             */
-        case MMU_NORMAL_PT_UPDATE:
-        case MMU_PT_UPDATE_PRESERVE_AD:
-        {
-            p2m_type_t p2mt;
-
-            rc = -EOPNOTSUPP;
-            if ( unlikely(paging_mode_refcounts(pt_owner)) )
-                break;
-
-            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
-            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
-            {
-                xsm_needed |= XSM_MMU_UPDATE_READ;
-                if ( get_pte_flags(req.val) & _PAGE_RW )
-                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
-            }
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, 
xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-            rc = -EINVAL;
-
-            req.ptr -= cmd;
-            gmfn = req.ptr >> PAGE_SHIFT;
-            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
-
-            if ( p2m_is_paged(p2mt) )
-            {
-                ASSERT(!page);
-                p2m_mem_paging_populate(pg_owner, gmfn);
-                rc = -ENOENT;
-                break;
-            }
-
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for normal update\n");
-                break;
-            }
-
-            mfn = page_to_mfn(page);
-            va = map_domain_page_with_cache(mfn, &mapcache);
-            va = (void *)((unsigned long)va +
-                          (unsigned long)(req.ptr & ~PAGE_MASK));
-
-            if ( page_lock(page) )
-            {
-                switch ( page->u.inuse.type_info & PGT_type_mask )
-                {
-                case PGT_l1_page_table:
-                {
-                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    p2m_type_t l1e_p2mt = p2m_ram_rw;
-                    struct page_info *target = NULL;
-                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
-                                        P2M_UNSHARE : P2M_ALLOC;
-
-                    if ( paging_mode_translate(pg_owner) )
-                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
-                                                   &l1e_p2mt, q);
-
-                    if ( p2m_is_paged(l1e_p2mt) )
-                    {
-                        if ( target )
-                            put_page(target);
-                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
-                    {
-                        rc = -ENOENT;
-                        break;
-                    }
-                    /* If we tried to unshare and failed */
-                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
-                    {
-                        /* We could not have obtained a page ref. */
-                        ASSERT(target == NULL);
-                        /* And mem_sharing_notify has already been called. */
-                        rc = -ENOMEM;
-                        break;
-                    }
-
-                    rc = mod_l1_entry(va, l1e, mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
-                                      pg_owner);
-                    if ( target )
-                        put_page(target);
-                }
-                break;
-                case PGT_l2_page_table:
-                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l3_page_table:
-                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l4_page_table:
-                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                break;
-                case PGT_writable_page:
-                    perfc_incr(writable_mmu_updates);
-                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                        rc = 0;
-                    break;
-                }
-                page_unlock(page);
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-            }
-            else if ( get_page_type(page, PGT_writable_page) )
-            {
-                perfc_incr(writable_mmu_updates);
-                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                    rc = 0;
-                put_page_type(page);
-            }
-
-            unmap_domain_page_with_cache(va, &mapcache);
-            put_page(page);
-        }
-        break;
-
-        case MMU_MACHPHYS_UPDATE:
-            if ( unlikely(d != pt_owner) )
-            {
-                rc = -EPERM;
-                break;
-            }
-
-            if ( unlikely(paging_mode_translate(pg_owner)) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            mfn = req.ptr >> PAGE_SHIFT;
-            gpfn = req.val;
-
-            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-
-            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+            if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) &&
+                 likely(nx & (PGT_validated|PGT_partial)) )
             {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for mach->phys update\n");
-                rc = -EINVAL;
+                /*
+                 * Page-table pages must be unvalidated when count is zero. The
+                 * 'free' is safe because the refcnt is non-zero and validated
+                 * bit is clear => other ops will spin or fail.
+                 */
+                nx = x & ~(PGT_validated|PGT_partial);
+                if ( unlikely((y = cmpxchg(&page->u.inuse.type_info,
+                                           x, nx)) != x) )
+                    continue;
+                /* We cleared the 'valid bit' so we do the clean up. */
+                rc = __put_final_page_type(page, x, preemptible);
+                if ( x & PGT_partial )
+                    put_page(page);
                 break;
             }
 
-            set_gpfn_from_mfn(mfn, gpfn);
-
-            paging_mark_dirty(pg_owner, _mfn(mfn));
-
-            put_page(mfn_to_page(mfn));
-            break;
-
-        default:
-            rc = -ENOSYS;
-            break;
+            /*
+             * Record TLB information for flush later. We do not stamp page
+             * tables when running in shadow mode:
+             *  1. Pointless, since it's the shadow pt's which must be tracked.
+             *  2. Shadow mode reuses this field for shadowed page tables to
+             *     store flags info -- we don't want to conflict with that.
+             */
+            if ( !(shadow_mode_enabled(page_get_owner(page)) &&
+                   (page->count_info & PGC_page_table)) )
+                page->tlbflush_timestamp = tlbflush_current_time();
         }
 
-        if ( unlikely(rc) )
+        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
             break;
 
-        guest_handle_add_offset(ureqs, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmu_update, "hihi",
-            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmu_update, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    domain_mmap_cache_destroy(&mapcache);
-
-    perfc_add(num_page_updates, i);
-
- out:
-    if ( pt_owner != d )
-        rcu_unlock_domain(pt_owner);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
+        if ( preemptible && hypercall_preempt_check() )
+            return -EINTR;
     }
 
     return rc;
 }
 
 
-static int create_grant_pte_mapping(
-    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
+static int __get_page_type(struct page_info *page, unsigned long type,
+                           int preemptible)
 {
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
-    struct domain *d = v->domain;
-
-    adjust_guest_l1e(nl1e, d);
-
-    gmfn = pte_addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
-    if ( unlikely(!page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
-        return GNTST_general_error;
-    }
-    
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+    unsigned long nx, x, y = page->u.inuse.type_info;
+    int rc = 0, iommu_ret = 0;
 
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
-    }
+    ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
+    ASSERT(!in_irq());
 
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    for ( ; ; )
     {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
+        x  = y;
+        nx = x + 1;
+        if ( unlikely((nx & PGT_count_mask) == 0) )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Type count overflow on mfn %"PRI_mfn"\n",
+                     page_to_mfn(page));
+            return -EINVAL;
+        }
+        else if ( unlikely((x & PGT_count_mask) == 0) )
+        {
+            struct domain *d = page_get_owner(page);
 
-    ol1e = *(l1_pgentry_t *)va;
-    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    } 
+            /* Normally we should never let a page go from type count 0
+             * to type count 1 when it is shadowed. One exception:
+             * out-of-sync shadowed pages are allowed to become
+             * writeable. */
+            if ( d && shadow_mode_enabled(d)
+                 && (page->count_info & PGC_page_table)
+                 && !((page->shadow_flags & (1u<<29))
+                      && type == PGT_writable_page) )
+               shadow_remove_all_shadows(d, _mfn(page_to_mfn(page)));
 
-    page_unlock(page);
+            ASSERT(!(x & PGT_pae_xen_l2));
+            if ( (x & PGT_type_mask) != type )
+            {
+                /*
+                 * On type change we check to flush stale TLB entries. This 
+                 * may be unnecessary (e.g., page was GDT/LDT) but those 
+                 * circumstances should be very rare.
+                 */
+                cpumask_t *mask = this_cpu(scratch_cpumask);
 
-    if ( !paging_mode_refcounts(d) )
-        put_page_from_l1e(ol1e, d);
+                BUG_ON(in_irq());
+                cpumask_copy(mask, d->domain_dirty_cpumask);
 
- failed:
-    unmap_domain_page(va);
-    put_page(page);
+                /* Don't flush if the timestamp is old enough */
+                tlbflush_filter(mask, page->tlbflush_timestamp);
 
-    return rc;
-}
+                if ( unlikely(!cpumask_empty(mask)) &&
+                     /* Shadow mode: track only writable pages. */
+                     (!shadow_mode_enabled(page_get_owner(page)) ||
+                      ((nx & PGT_type_mask) == PGT_writable_page)) )
+                {
+                    perfc_incr(need_flush_tlb_flush);
+                    flush_tlb_mask(mask);
+                }
 
-static int destroy_grant_pte_mapping(
-    uint64_t addr, unsigned long frame, struct domain *d)
-{
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
+                /* We lose existing type and validity. */
+                nx &= ~(PGT_type_mask | PGT_validated);
+                nx |= type;
 
-    gmfn = addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+                /* No special validation needed for writable pages. */
+                /* Page tables and GDT/LDT need to be scanned for validity. */
+                if ( type == PGT_writable_page || type == PGT_shared_page )
+                    nx |= PGT_validated;
+            }
+        }
+        else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
+        {
+            /* Don't log failure if it could be a recursive-mapping attempt. */
+            if ( ((x & PGT_type_mask) == PGT_l2_page_table) &&
+                 (type == PGT_l1_page_table) )
+                return -EINVAL;
+            if ( ((x & PGT_type_mask) == PGT_l3_page_table) &&
+                 (type == PGT_l2_page_table) )
+                return -EINVAL;
+            if ( ((x & PGT_type_mask) == PGT_l4_page_table) &&
+                 (type == PGT_l3_page_table) )
+                return -EINVAL;
+            gdprintk(XENLOG_WARNING,
+                     "Bad type (saw %" PRtype_info " != exp %" PRtype_info ") "
+                     "for mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n",
+                     x, type, page_to_mfn(page),
+                     get_gpfn_from_mfn(page_to_mfn(page)));
+            return -EINVAL;
+        }
+        else if ( unlikely(!(x & PGT_validated)) )
+        {
+            if ( !(x & PGT_partial) )
+            {
+                /* Someone else is updating validation of this page. Wait... */
+                while ( (y = page->u.inuse.type_info) == x )
+                {
+                    if ( preemptible && hypercall_preempt_check() )
+                        return -EINTR;
+                    cpu_relax();
+                }
+                continue;
+            }
+            /* Type ref count was left at 1 when PGT_partial got set. */
+            ASSERT((x & PGT_count_mask) == 1);
+            nx = x & ~PGT_partial;
+        }
 
-    if ( unlikely(!page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
-        return GNTST_general_error;
-    }
-    
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
+            break;
 
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
+        if ( preemptible && hypercall_preempt_check() )
+            return -EINTR;
     }
 
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    if ( unlikely((x & PGT_type_mask) != type) )
     {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
+        /* Special pages should not be accessible from devices. */
+        struct domain *d = page_get_owner(page);
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
+        {
+            if ( (x & PGT_type_mask) == PGT_writable_page )
+                iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, 
page_to_mfn(page)));
+            else if ( type == PGT_writable_page )
+                iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, 
page_to_mfn(page)),
+                                           page_to_mfn(page),
+                                           IOMMUF_readable|IOMMUF_writable);
+        }
     }
 
-    ol1e = *(l1_pgentry_t *)va;
-    
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    if ( unlikely(!(nx & PGT_validated)) )
     {
-        page_unlock(page);
-        gdprintk(XENLOG_WARNING,
-                 "PTE entry %"PRIpte" for address %"PRIx64" doesn't match 
frame %lx\n",
-                 l1e_get_intpte(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto failed;
+        if ( !(x & PGT_partial) )
+        {
+            page->nr_validated_ptes = 0;
+            page->partial_pte = 0;
+        }
+        rc = alloc_page_type(page, type, preemptible);
     }
 
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY
-                  (l1, 
-                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
-                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
-                   0)) )
-    {
-        page_unlock(page);
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va);
-        rc = GNTST_general_error;
-        goto failed;
-    }
+    if ( (x & PGT_partial) && !(nx & PGT_partial) )
+        put_page(page);
 
-    page_unlock(page);
+    if ( !rc )
+        rc = iommu_ret;
 
- failed:
-    unmap_domain_page(va);
-    put_page(page);
     return rc;
 }
 
-
-static int create_grant_va_mapping(
-    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
+void put_page_type(struct page_info *page)
 {
-    l1_pgentry_t *pl1e, ol1e;
-    struct domain *d = v->domain;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int okay;
-    
-    adjust_guest_l1e(nl1e, d);
-
-    pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", 
va);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+    int rc = __put_page_type(page, 0);
+    ASSERT(rc == 0);
+    (void)rc;
+}
 
-    page_unlock(l1pg);
-    put_page(l1pg);
-    guest_unmap_l1e(pl1e);
+int get_page_type(struct page_info *page, unsigned long type)
+{
+    int rc = __get_page_type(page, type, 0);
+    if ( likely(rc == 0) )
+        return 1;
+    ASSERT(rc != -EINTR && rc != -ERESTART);
+    return 0;
+}
 
-    if ( okay && !paging_mode_refcounts(d) )
-        put_page_from_l1e(ol1e, d);
+int put_page_type_preemptible(struct page_info *page)
+{
+    return __put_page_type(page, 1);
+}
 
-    return okay ? GNTST_okay : GNTST_general_error;
+int get_page_type_preemptible(struct page_info *page, unsigned long type)
+{
+    ASSERT(!current->arch.old_guest_table);
+    return __get_page_type(page, type, 1);
 }
 
-static int replace_grant_va_mapping(
-    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
+int vcpu_destroy_pagetables(struct vcpu *v)
 {
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc = 0;
-    
-    pl1e = guest_map_l1e(addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", 
addr);
-        return GNTST_general_error;
-    }
+    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
+    struct page_info *page;
+    l4_pgentry_t *l4tab = NULL;
+    int rc = put_old_guest_table(v);
 
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        rc = GNTST_general_error;
-        goto out;
-    }
+    if ( rc )
+        return rc;
 
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
+    if ( is_pv_32bit_vcpu(v) )
     {
-        rc = GNTST_general_error;
-        put_page(l1pg);
-        goto out;
+        l4tab = map_domain_page(_mfn(mfn));
+        mfn = l4e_get_pfn(*l4tab);
     }
 
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    if ( mfn )
     {
-        rc = GNTST_general_error;
-        goto unlock_and_out;
+        page = mfn_to_page(mfn);
+        if ( paging_mode_refcounts(v->domain) )
+            put_page(page);
+        else
+            rc = put_page_and_type_preemptible(page);
     }
 
-    ol1e = *pl1e;
-
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    if ( l4tab )
     {
-        gdprintk(XENLOG_WARNING,
-                 "PTE entry %lx for address %lx doesn't match frame %lx\n",
-                 l1e_get_pfn(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
+        if ( !rc )
+            l4e_write(l4tab, l4e_empty());
+        unmap_domain_page(l4tab);
     }
-
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+    else if ( !rc )
     {
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
+        v->arch.guest_table = pagetable_null();
+
+        /* Drop ref to guest_table_user (from MMUEXT_NEW_USER_BASEPTR) */
+        mfn = pagetable_get_pfn(v->arch.guest_table_user);
+        if ( mfn )
+        {
+            page = mfn_to_page(mfn);
+            if ( paging_mode_refcounts(v->domain) )
+                put_page(page);
+            else
+                rc = put_page_and_type_preemptible(page);
+        }
+        if ( !rc )
+            v->arch.guest_table_user = pagetable_null();
     }
 
- unlock_and_out:
-    page_unlock(l1pg);
-    put_page(l1pg);
- out:
-    guest_unmap_l1e(pl1e);
-    return rc;
-}
+    v->arch.cr3 = 0;
 
-static int destroy_grant_va_mapping(
-    unsigned long addr, unsigned long frame, struct vcpu *v)
-{
-    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+    /*
+     * put_page_and_type_preemptible() is liable to return -EINTR. The
+     * callers of us expect -ERESTART so convert it over.
+     */
+    return rc != -EINTR ? rc : -ERESTART;
 }
 
 static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
@@ -4267,34 +1032,6 @@ static int create_grant_p2m_mapping(uint64_t addr, 
unsigned long frame,
         return GNTST_okay;
 }
 
-static int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                                   unsigned int flags, unsigned int 
cache_flags)
-{
-    l1_pgentry_t pte;
-    uint32_t grant_pte_flags;
-
-    grant_pte_flags =
-        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
-    if ( cpu_has_nx )
-        grant_pte_flags |= _PAGE_NX_BIT;
-
-    pte = l1e_from_pfn(frame, grant_pte_flags);
-    if ( (flags & GNTMAP_application_map) )
-        l1e_add_flags(pte,_PAGE_USER);
-    if ( !(flags & GNTMAP_readonly) )
-        l1e_add_flags(pte,_PAGE_RW);
-
-    l1e_add_flags(pte,
-                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
-                   & _PAGE_AVAIL);
-
-    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
-
-    if ( flags & GNTMAP_contains_pte )
-        return create_grant_pte_mapping(addr, pte, current);
-    return create_grant_va_mapping(addr, pte, current);
-}
-
 int create_grant_host_mapping(uint64_t addr, unsigned long frame,
                               unsigned int flags, unsigned int cache_flags)
 {
@@ -4327,453 +1064,108 @@ static int replace_grant_p2m_mapping(
     guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
 
     put_gfn(d, gfn);
-    return GNTST_okay;
-}
-
-static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                                    uint64_t new_addr, unsigned int flags)
-{
-    struct vcpu *curr = current;
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc;
-
-    if ( flags & GNTMAP_contains_pte )
-    {
-        if ( !new_addr )
-            return destroy_grant_pte_mapping(addr, frame, curr->domain);
-
-        return GNTST_general_error;
-    }
-
-    if ( !new_addr )
-        return destroy_grant_va_mapping(addr, frame, curr);
-
-    pl1e = guest_map_l1e(new_addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Could not find L1 PTE for address %"PRIx64"\n", new_addr);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
-                                gl1mfn, curr, 0)) )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    page_unlock(l1pg);
-    put_page(l1pg);
-    guest_unmap_l1e(pl1e);
-
-    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
-    if ( rc && !paging_mode_refcounts(curr->domain) )
-        put_page_from_l1e(ol1e, curr->domain);
-
-    return rc;
-}
-
-int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
-                               uint64_t new_addr, unsigned int flags)
-{
-    if ( paging_mode_external(current->domain) )
-        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
-
-    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
-}
-
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    const struct domain *owner = dom_xen;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
-        goto fail;
-
-    if ( d->is_dying )
-        goto fail;
-
-    if ( page->count_info & ~(PGC_allocated | 1) )
-        goto fail;
-
-    if ( !(memflags & MEMF_no_refcount) )
-    {
-        if ( d->tot_pages >= d->max_pages )
-            goto fail;
-        domain_adjust_tot_pages(d, 1);
-    }
-
-    page->count_info = PGC_allocated | 1;
-    page_set_owner(page, d);
-    page_list_add_tail(page,&d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
-             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
-             page_to_mfn(page), d->domain_id,
-             owner ? owner->domain_id : DOMID_INVALID,
-             page->count_info, page->u.inuse.type_info);
-    return -1;
-}
-
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    unsigned long x, y;
-    bool_t drop_dom_ref = 0;
-    const struct domain *owner = dom_xen;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
-        goto fail;
-
-    /*
-     * We require there is just one reference (PGC_allocated). We temporarily
-     * drop this reference now so that we can safely swizzle the owner.
-     */
-    y = page->count_info;
-    do {
-        x = y;
-        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
-            goto fail;
-        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
-    } while ( y != x );
-
-    /* Swizzle the owner then reinstate the PGC_allocated reference. */
-    page_set_owner(page, NULL);
-    y = page->count_info;
-    do {
-        x = y;
-        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
-    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
-
-    /* Unlink from original owner. */
-    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
-        drop_dom_ref = 1;
-    page_list_del(page, &d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    if ( unlikely(drop_dom_ref) )
-        put_domain(d);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    gdprintk(XENLOG_WARNING, "Bad steal mfn %" PRI_mfn
-             " from d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
-             page_to_mfn(page), d->domain_id,
-             owner ? owner->domain_id : DOMID_INVALID,
-             page->count_info, page->u.inuse.type_info);
-    return -1;
-}
-
-static int __do_update_va_mapping(
-    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
-{
-    l1_pgentry_t   val = l1e_from_intpte(val64);
-    struct vcpu   *v   = current;
-    struct domain *d   = v->domain;
-    struct page_info *gl1pg;
-    l1_pgentry_t  *pl1e;
-    unsigned long  bmap_ptr, gl1mfn;
-    cpumask_t     *mask = NULL;
-    int            rc;
-
-    perfc_incr(calls_to_update_va);
-
-    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
-    if ( rc )
-        return rc;
-
-    rc = -EINVAL;
-    pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
-        goto out;
-
-    gl1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(gl1pg) )
-    {
-        put_page(gl1pg);
-        goto out;
-    }
-
-    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(gl1pg);
-        put_page(gl1pg);
-        goto out;
-    }
-
-    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
-
-    page_unlock(gl1pg);
-    put_page(gl1pg);
-
- out:
-    if ( pl1e )
-        guest_unmap_l1e(pl1e);
-
-    switch ( flags & UVMF_FLUSHTYPE_MASK )
-    {
-    case UVMF_TLB_FLUSH:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            flush_tlb_local();
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_mask(mask);
-        break;
-
-    case UVMF_INVLPG:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            paging_invlpg(v, va);
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_one_mask(mask, va);
-        break;
-    }
-
-    return rc;
-}
-
-long do_update_va_mapping(unsigned long va, u64 val64,
-                          unsigned long flags)
-{
-    return __do_update_va_mapping(va, val64, flags, current->domain);
-}
-
-long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
-                                      unsigned long flags,
-                                      domid_t domid)
-{
-    struct domain *pg_owner;
-    int rc;
-
-    if ( (pg_owner = get_pg_owner(domid)) == NULL )
-        return -ESRCH;
-
-    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
-
-    put_pg_owner(pg_owner);
-
-    return rc;
-}
-
-
-
-/*************************
- * Descriptor Tables
- */
-
-void destroy_gdt(struct vcpu *v)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
-
-    v->arch.pv_vcpu.gdt_ents = 0;
-    pl1e = gdt_ldt_ptes(v->domain, v);
-    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
-    {
-        pfn = l1e_get_pfn(pl1e[i]);
-        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
-            put_page_and_type(mfn_to_page(pfn));
-        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
-        v->arch.pv_vcpu.gdt_frames[i] = 0;
-    }
-}
-
-
-long set_gdt(struct vcpu *v, 
-             unsigned long *frames,
-             unsigned int entries)
-{
-    struct domain *d = v->domain;
-    l1_pgentry_t *pl1e;
-    /* NB. There are 512 8-byte entries per GDT page. */
-    unsigned int i, nr_pages = (entries + 511) / 512;
-
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    /* Check the pages in the new GDT. */
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        struct page_info *page;
-
-        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
-        if ( !page )
-            goto fail;
-        if ( !get_page_type(page, PGT_seg_desc_page) )
-        {
-            put_page(page);
-            goto fail;
-        }
-        frames[i] = page_to_mfn(page);
-    }
-
-    /* Tear down the old GDT. */
-    destroy_gdt(v);
-
-    /* Install the new GDT. */
-    v->arch.pv_vcpu.gdt_ents = entries;
-    pl1e = gdt_ldt_ptes(d, v);
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
-        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
-    }
-
-    return 0;
-
- fail:
-    while ( i-- > 0 )
-    {
-        put_page_and_type(mfn_to_page(frames[i]));
-    }
-    return -EINVAL;
+    return GNTST_okay;
 }
 
-
-long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
-                unsigned int entries)
+int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
+                               uint64_t new_addr, unsigned int flags)
 {
-    int nr_pages = (entries + 511) / 512;
-    unsigned long frames[16];
-    struct vcpu *curr = current;
-    long ret;
+    if ( paging_mode_external(current->domain) )
+        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
 
-    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-    
-    if ( copy_from_guest(frames, frame_list, nr_pages) )
-        return -EFAULT;
+    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
+}
+
+int donate_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
+{
+    const struct domain *owner = dom_xen;
 
-    domain_lock(curr->domain);
+    spin_lock(&d->page_alloc_lock);
 
-    if ( (ret = set_gdt(curr, frames, entries)) == 0 )
-        flush_tlb_local();
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
+        goto fail;
 
-    domain_unlock(curr->domain);
+    if ( d->is_dying )
+        goto fail;
 
-    return ret;
-}
+    if ( page->count_info & ~(PGC_allocated | 1) )
+        goto fail;
 
+    if ( !(memflags & MEMF_no_refcount) )
+    {
+        if ( d->tot_pages >= d->max_pages )
+            goto fail;
+        domain_adjust_tot_pages(d, 1);
+    }
 
-long do_update_descriptor(u64 pa, u64 desc)
-{
-    struct domain *dom = current->domain;
-    unsigned long gmfn = pa >> PAGE_SHIFT;
-    unsigned long mfn;
-    unsigned int  offset;
-    struct desc_struct *gdt_pent, d;
-    struct page_info *page;
-    long ret = -EINVAL;
+    page->count_info = PGC_allocated | 1;
+    page_set_owner(page, d);
+    page_list_add_tail(page,&d->page_list);
 
-    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
+    spin_unlock(&d->page_alloc_lock);
+    return 0;
 
-    *(u64 *)&d = desc;
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
+             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
+             page_to_mfn(page), d->domain_id,
+             owner ? owner->domain_id : DOMID_INVALID,
+             page->count_info, page->u.inuse.type_info);
+    return -1;
+}
 
-    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
-    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
-         !page ||
-         !check_descriptor(dom, &d) )
-    {
-        if ( page )
-            put_page(page);
-        return -EINVAL;
-    }
-    mfn = page_to_mfn(page);
+int steal_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
+{
+    unsigned long x, y;
+    bool_t drop_dom_ref = 0;
+    const struct domain *owner = dom_xen;
 
-    /* Check if the given frame is in use in an unsafe context. */
-    switch ( page->u.inuse.type_info & PGT_type_mask )
-    {
-    case PGT_seg_desc_page:
-        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
-            goto out;
-        break;
-    default:
-        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
-            goto out;
-        break;
-    }
+    spin_lock(&d->page_alloc_lock);
 
-    paging_mark_dirty(dom, _mfn(mfn));
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
+        goto fail;
 
-    /* All is good so make the update. */
-    gdt_pent = map_domain_page(_mfn(mfn));
-    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
-    unmap_domain_page(gdt_pent);
+    /*
+     * We require there is just one reference (PGC_allocated). We temporarily
+     * drop this reference now so that we can safely swizzle the owner.
+     */
+    y = page->count_info;
+    do {
+        x = y;
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+            goto fail;
+        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
+    } while ( y != x );
 
-    put_page_type(page);
+    /* Swizzle the owner then reinstate the PGC_allocated reference. */
+    page_set_owner(page, NULL);
+    y = page->count_info;
+    do {
+        x = y;
+        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
+    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
 
-    ret = 0; /* success */
+    /* Unlink from original owner. */
+    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
+        drop_dom_ref = 1;
+    page_list_del(page, &d->page_list);
 
- out:
-    put_page(page);
+    spin_unlock(&d->page_alloc_lock);
+    if ( unlikely(drop_dom_ref) )
+        put_domain(d);
+    return 0;
 
-    return ret;
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    gdprintk(XENLOG_WARNING, "Bad steal mfn %" PRI_mfn
+             " from d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
+             page_to_mfn(page), d->domain_id,
+             owner ? owner->domain_id : DOMID_INVALID,
+             page->count_info, page->u.inuse.type_info);
+    return -1;
 }
 
 typedef struct e820entry e820entry_t;
@@ -5181,466 +1573,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
     return 0;
 }
 
-
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    unsigned long cr2;
-    l1_pgentry_t  pte;
-};
-
-static int ptwr_emulated_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned int rc = bytes;
-    unsigned long addr = offset;
-
-    if ( !__addr_ok(addr) ||
-         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
-    unsigned long addr,
-    paddr_t old,
-    paddr_t val,
-    unsigned int bytes,
-    unsigned int do_cmpxchg,
-    struct ptwr_emulate_ctxt *ptwr_ctxt)
-{
-    unsigned long mfn;
-    unsigned long unaligned_addr = addr;
-    struct page_info *page;
-    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int ret;
-
-    /* Only allow naturally-aligned stores within the original %cr2 page. */
-    if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                 ptwr_ctxt->cr2, addr, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    /* Turn a sub-word access into a full-word access. */
-    if ( bytes != sizeof(paddr_t) )
-    {
-        paddr_t      full;
-        unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
-
-        /* Align address; read full word. */
-        addr &= ~(sizeof(paddr_t)-1);
-        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
-        {
-            x86_emul_pagefault(0, /* Read fault. */
-                               addr + sizeof(paddr_t) - rc,
-                               &ptwr_ctxt->ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-        /* Mask out bits provided by caller. */
-        full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
-        /* Shift the caller value and OR in the missing bits. */
-        val  &= (((paddr_t)1 << (bytes*8)) - 1);
-        val <<= (offset)*8;
-        val  |= full;
-        /* Also fill in missing parts of the cmpxchg old value. */
-        old  &= (((paddr_t)1 << (bytes*8)) - 1);
-        old <<= (offset)*8;
-        old  |= full;
-    }
-
-    pte  = ptwr_ctxt->pte;
-    mfn  = l1e_get_pfn(pte);
-    page = mfn_to_page(mfn);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
-    ASSERT(mfn_valid(_mfn(mfn)));
-    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
-    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
-    ASSERT(page_get_owner(page) == d);
-
-    /* Check the new PTE. */
-    nl1e = l1e_from_intpte(val);
-    switch ( ret = get_page_from_l1e(nl1e, d, d) )
-    {
-    default:
-        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
-             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
-        {
-            /*
-             * If this is an upper-half write to a PAE PTE then we assume that
-             * the guest has simply got the two writes the wrong way round. We
-             * zap the PRESENT bit on the assumption that the bottom half will
-             * be written immediately after we return to the guest.
-             */
-            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
-                     PRIpte"\n", l1e_get_intpte(nl1e));
-            l1e_remove_flags(nl1e, _PAGE_PRESENT);
-        }
-        else
-        {
-            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
-            return X86EMUL_UNHANDLEABLE;
-        }
-        break;
-    case 0:
-        break;
-    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-        l1e_flip_flags(nl1e, ret);
-        break;
-    }
-
-    adjust_guest_l1e(nl1e, d);
-
-    /* Checked successfully: do the update (write or cmpxchg). */
-    pl1e = map_domain_page(_mfn(mfn));
-    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
-    if ( do_cmpxchg )
-    {
-        int okay;
-        intpte_t t = old;
-        ol1e = l1e_from_intpte(old);
-
-        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
-                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
-        okay = (okay && t == old);
-
-        if ( !okay )
-        {
-            unmap_domain_page(pl1e);
-            put_page_from_l1e(nl1e, d);
-            return X86EMUL_RETRY;
-        }
-    }
-    else
-    {
-        ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
-            BUG();
-    }
-
-    trace_ptwr_emulation(addr, nl1e);
-
-    unmap_domain_page(pl1e);
-
-    /* Finally, drop the old PTE. */
-    put_page_from_l1e(ol1e, d);
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t val = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
-    {
-        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&val, p_data, bytes);
-
-    return ptwr_emulated_update(
-        offset, 0, val, bytes, 0,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int ptwr_emulated_cmpxchg(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_old,
-    void *p_new,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t old = 0, new = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
-    {
-        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&old, p_old, bytes);
-    memcpy(&new, p_new, bytes);
-
-    return ptwr_emulated_update(
-        offset, old, new, bytes, 1,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
-                                              : X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
-    .read       = ptwr_emulated_read,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = ptwr_emulated_write,
-    .cmpxchg    = ptwr_emulated_cmpxchg,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, 
-                       struct cpu_user_regs *regs)
-{
-    struct domain *d = v->domain;
-    struct page_info *page;
-    l1_pgentry_t      pte;
-    struct ptwr_emulate_ctxt ptwr_ctxt = {
-        .ctxt = {
-            .regs = regs,
-            .vendor = d->arch.cpuid->x86_vendor,
-            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .swint_emulate = x86_swint_emulate_none,
-        },
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
-         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
-         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
-        goto bail;
-
-    page = l1e_get_page(pte);
-    if ( !page_lock(page) )
-    {
-        put_page(page);
-        goto bail;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        put_page(page);
-        goto bail;
-    }
-
-    ptwr_ctxt.cr2 = addr;
-    ptwr_ctxt.pte = pte;
-
-    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
-
-    page_unlock(page);
-    put_page(page);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to pagetables which are marked
-         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
-         * update has succeeded on a different vcpu).  Anything else is an
-         * emulation bug, or a guest playing with the instruction stream under
-         * Xen's feet.
-         */
-        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ptwr_ctxt.ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ptwr_ctxt.ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
- bail:
-    return 0;
-}
-
-/*************************
- * fault handling for read-only MMIO pages
- */
-
-int mmio_ro_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
-
-    /* Only allow naturally-aligned stores at the original %cr2 address. */
-    if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
-         offset != mmio_ro_ctxt->cr2 )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                mmio_ro_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmio_ro_emulate_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmio_ro_emulated_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-int mmcfg_intercept_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
-
-    /*
-     * Only allow naturally-aligned stores no wider than 4 bytes to the
-     * original %cr2 address.
-     */
-    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
-         offset != mmio_ctxt->cr2 )
-    {
-        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
-                mmio_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    offset &= 0xfff;
-    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
-                                  offset, bytes, p_data) >= 0 )
-        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
-                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
-                        *(uint32_t *)p_data);
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmcfg_intercept_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmcfg_intercept_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Check if guest is trying to modify a r/o MMIO page. */
-int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
-                          struct cpu_user_regs *regs)
-{
-    l1_pgentry_t pte;
-    unsigned long mfn;
-    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
-    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
-    struct x86_emulate_ctxt ctxt = {
-        .regs = regs,
-        .vendor = v->domain->arch.cpuid->x86_vendor,
-        .addr_size = addr_size,
-        .sp_size = addr_size,
-        .swint_emulate = x86_swint_emulate_none,
-        .data = &mmio_ro_ctxt
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of MMIO pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
-        return 0;
-
-    mfn = l1e_get_pfn(pte);
-    if ( mfn_valid(_mfn(mfn)) )
-    {
-        struct page_info *page = mfn_to_page(mfn);
-        struct domain *owner = page_get_owner_and_reference(page);
-
-        if ( owner )
-            put_page(page);
-        if ( owner != dom_io )
-            return 0;
-    }
-
-    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-        return 0;
-
-    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
-        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
-    else
-        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to MMCFG space or read-only MFNs.
-         * We tolerate #PF (from hitting an adjacent page or a successful
-         * concurrent pagetable update).  Anything else is an emulation bug,
-         * or a guest playing with the instruction stream under Xen's feet.
-         */
-        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ctxt.event.type, ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 void *alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index ea94599438..665be5536c 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -1,2 +1,3 @@
 obj-y += hypercall.o
 obj-bin-y += dom0_build.init.o
+obj-y += mm.o
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
new file mode 100644
index 0000000000..b5277b5d28
--- /dev/null
+++ b/xen/arch/x86/pv/mm.c
@@ -0,0 +1,4118 @@
+/******************************************************************************
+ * arch/x86/pv/mm.c
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * A description of the x86 page table API:
+ *
+ * Domains trap to do_mmu_update with a list of update requests.
+ * This is a list of (ptr, val) pairs, where the requested operation
+ * is *ptr = val.
+ *
+ * Reference counting of pages:
+ * ----------------------------
+ * Each page has two refcounts: tot_count and type_count.
+ *
+ * TOT_COUNT is the obvious reference count. It counts all uses of a
+ * physical page frame by a domain, including uses as a page directory,
+ * a page table, or simple mappings via a PTE. This count prevents a
+ * domain from releasing a frame back to the free pool when it still holds
+ * a reference to it.
+ *
+ * TYPE_COUNT is more subtle. A frame can be put to one of three
+ * mutually-exclusive uses: it might be used as a page directory, or a
+ * page table, or it may be mapped writable by the domain [of course, a
+ * frame may not be used in any of these three ways!].
+ * So, type_count is a count of the number of times a frame is being
+ * referred to in its current incarnation. Therefore, a page can only
+ * change its type when its type count is zero.
+ *
+ * Pinning the page type:
+ * ----------------------
+ * The type of a page can be pinned/unpinned with the commands
+ * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
+ * pinning is not reference counted, so it can't be nested).
+ * This is useful to prevent a page's type count falling to zero, at which
+ * point safety checks would need to be carried out next time the count
+ * is increased again.
+ *
+ * A further note on writable page mappings:
+ * -----------------------------------------
+ * For simplicity, the count of writable mappings for a page may not
+ * correspond to reality. The 'writable count' is incremented for every
+ * PTE which maps the page with the _PAGE_RW flag set. However, for
+ * write access to be possible the page directory entry must also have
+ * its _PAGE_RW bit set. We do not check this as it complicates the
+ * reference counting considerably [consider the case of multiple
+ * directory entries referencing a single page table, some with the RW
+ * bit set, others not -- it starts getting a bit messy].
+ * In normal use, this simplification shouldn't be a problem.
+ * However, the logic can be added if required.
+ *
+ * One more note on read-only page mappings:
+ * -----------------------------------------
+ * We want domains to be able to map pages for read-only access. The
+ * main reason is that page tables and directories should be readable
+ * by a domain, but it would not be safe for them to be writable.
+ * However, domains have free access to rings 1 & 2 of the Intel
+ * privilege model. In terms of page protection, these are considered
+ * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
+ * read-only restrictions are respected in supervisor mode -- if the
+ * bit is clear then any mapped page is writable.
+ *
+ * We get round this by always setting the WP bit and disallowing
+ * updates to it. This is very unlikely to cause a problem for guest
+ * OS's, which will generally use the WP bit to simplify copy-on-write
+ * implementation (in that case, OS wants a fault when it writes to
+ * an application-supplied buffer).
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+#include <xen/iocap.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/trace.h>
+#include <xsm/xsm.h>
+
+#include <asm/ldt.h>
+#include <asm/p2m.h>
+#include <asm/paging.h>
+#include <asm/shadow.h>
+#include <asm/x86_emulate.h>
+
+extern s8 __read_mostly opt_mmio_relax;
+
+extern uint32_t base_disallow_mask;
+/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
+#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
+
+#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \
+                          ? base_disallow_mask & ~_PAGE_PSE \
+                          : base_disallow_mask)
+
+#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
+                             base_disallow_mask : 0xFFFFF198U)
+
+#define L4_DISALLOW_MASK (base_disallow_mask)
+
+#define l1_disallow_mask(d)                                     \
+    ((d != dom_io) &&                                           \
+     (rangeset_is_empty((d)->iomem_caps) &&                     \
+      rangeset_is_empty((d)->arch.ioport_caps) &&               \
+      !has_arch_pdevs(d) &&                                     \
+      is_pv_domain(d)) ?                                        \
+     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+
+/* Get a mapping of a PV guest's l1e for this virtual address. */
+static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
+{
+    l2_pgentry_t l2e;
+
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) )
+        return NULL;
+
+    /* Find this l1e and its enclosing l1mfn in the linear map. */
+    if ( __copy_from_user(&l2e,
+                          &__linear_l2_table[l2_linear_offset(addr)],
+                          sizeof(l2_pgentry_t)) )
+        return NULL;
+
+    /* Check flags that it will be safe to read the l1e. */
+    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
+        return NULL;
+
+    *gl1mfn = l2e_get_pfn(l2e);
+
+    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
+           l1_table_offset(addr);
+}
+
+/* Pull down the mapping we got from guest_map_l1e(). */
+static inline void guest_unmap_l1e(void *p)
+{
+    unmap_domain_page(p);
+}
+
+/* Read a PV guest's l1e that maps this virtual address. */
+static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
+{
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) ||
+         __copy_from_user(eff_l1e,
+                          &__linear_l1_table[l1_linear_offset(addr)],
+                          sizeof(l1_pgentry_t)) )
+        *eff_l1e = l1e_empty();
+}
+
+/*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+                                          void *eff_l1e)
+{
+    bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+    TOGGLE_MODE();
+    guest_get_eff_l1e(addr, eff_l1e);
+    TOGGLE_MODE();
+}
+
+const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
+    zero_page[PAGE_SIZE];
+
+static void invalidate_shadow_ldt(struct vcpu *v, int flush)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    struct page_info *page;
+
+    BUG_ON(unlikely(in_irq()));
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
+        goto out;
+
+    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
+    pl1e = gdt_ldt_ptes(v->domain, v);
+
+    for ( i = 16; i < 32; i++ )
+    {
+        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
+            continue;
+        page = l1e_get_page(pl1e[i]);
+        l1e_write(&pl1e[i], l1e_empty());
+        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
+        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
+        put_page_and_type(page);
+    }
+
+    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
+    if ( flush )
+        flush_tlb_mask(v->vcpu_dirty_cpumask);
+
+ out:
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+}
+
+
+static int alloc_segdesc_page(struct page_info *page)
+{
+    const struct domain *owner = page_get_owner(page);
+    struct desc_struct *descs = __map_domain_page(page);
+    unsigned i;
+
+    for ( i = 0; i < 512; i++ )
+        if ( unlikely(!check_descriptor(owner, &descs[i])) )
+            break;
+
+    unmap_domain_page(descs);
+
+    return i == 512 ? 0 : -EINVAL;
+}
+
+
+/* Map shadow page at offset @off. */
+int map_ldt_shadow_page(unsigned int off)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    unsigned long gmfn;
+    struct page_info *page;
+    l1_pgentry_t l1e, nl1e;
+    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
+    int okay;
+
+    BUG_ON(unlikely(in_irq()));
+
+    if ( is_pv_32bit_domain(d) )
+        gva = (u32)gva;
+    guest_get_eff_kern_l1e(v, gva, &l1e);
+    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
+        return 0;
+
+    gmfn = l1e_get_pfn(l1e);
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    if ( unlikely(!page) )
+        return 0;
+
+    okay = get_page_type(page, PGT_seg_desc_page);
+    if ( unlikely(!okay) )
+    {
+        put_page(page);
+        return 0;
+    }
+
+    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
+    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    return 1;
+}
+
+
+/*
+ * We allow root tables to map each other (a.k.a. linear page tables). It
+ * needs some special care with reference counts and access permissions:
+ *  1. The mapping entry must be read-only, or the guest may get write access
+ *     to its own PTEs.
+ *  2. We must only bump the reference counts for an *already validated*
+ *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
+ *     on a validation that is required to complete that validation.
+ *  3. We only need to increment the reference counts for the mapped page
+ *     frame if it is mapped by a different root table. This is sufficient and
+ *     also necessary to allow validation of a root table mapping itself.
+ */
+#define define_get_linear_pagetable(level)                                  \
+static int                                                                  \
+get_##level##_linear_pagetable(                                             \
+    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
+{                                                                           \
+    unsigned long x, y;                                                     \
+    struct page_info *page;                                                 \
+    unsigned long pfn;                                                      \
+                                                                            \
+    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
+    {                                                                       \
+        gdprintk(XENLOG_WARNING,                                            \
+                 "Attempt to create linear p.t. with write perms\n");       \
+        return 0;                                                           \
+    }                                                                       \
+                                                                            \
+    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
+    {                                                                       \
+        /* Make sure the mapped frame belongs to the correct domain. */     \
+        if ( unlikely(!get_page_from_pagenr(pfn, d)) )                      \
+            return 0;                                                       \
+                                                                            \
+        /*                                                                  \
+         * Ensure that the mapped frame is an already-validated page table. \
+         * If so, atomically increment the count (checking for overflow).   \
+         */                                                                 \
+        page = mfn_to_page(pfn);                                            \
+        y = page->u.inuse.type_info;                                        \
+        do {                                                                \
+            x = y;                                                          \
+            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
+                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
+                          (PGT_##level##_page_table|PGT_validated)) )       \
+            {                                                               \
+                put_page(page);                                             \
+                return 0;                                                   \
+            }                                                               \
+        }                                                                   \
+        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
+    }                                                                       \
+                                                                            \
+    return 1;                                                               \
+}
+
+#ifndef NDEBUG
+struct mmio_emul_range_ctxt {
+    const struct domain *d;
+    unsigned long mfn;
+};
+
+static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
+{
+    const struct mmio_emul_range_ctxt *ctxt = arg;
+
+    if ( ctxt->mfn > e )
+        return 0;
+
+    if ( ctxt->mfn >= s )
+    {
+        static DEFINE_SPINLOCK(last_lock);
+        static const struct domain *last_d;
+        static unsigned long last_s = ~0UL, last_e;
+        bool_t print = 0;
+
+        spin_lock(&last_lock);
+        if ( last_d != ctxt->d || last_s != s || last_e != e )
+        {
+            last_d = ctxt->d;
+            last_s = s;
+            last_e = e;
+            print = 1;
+        }
+        spin_unlock(&last_lock);
+
+        if ( print )
+            printk(XENLOG_G_INFO
+                   "d%d: Forcing write emulation on MFNs %lx-%lx\n",
+                   ctxt->d->domain_id, s, e);
+    }
+
+    return 1;
+}
+#endif
+
+int
+get_page_from_l1e(
+    l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner)
+{
+    unsigned long mfn = l1e_get_pfn(l1e);
+    struct page_info *page = mfn_to_page(mfn);
+    uint32_t l1f = l1e_get_flags(l1e);
+    struct vcpu *curr = current;
+    struct domain *real_pg_owner;
+    bool_t write;
+
+    if ( !(l1f & _PAGE_PRESENT) )
+        return 0;
+
+    if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+                 l1f & l1_disallow_mask(l1e_owner));
+        return -EINVAL;
+    }
+
+    if ( !mfn_valid(_mfn(mfn)) ||
+         (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
+    {
+        int flip = 0;
+
+        /* Only needed the reference to confirm dom_io ownership. */
+        if ( mfn_valid(_mfn(mfn)) )
+            put_page(page);
+
+        /* DOMID_IO reverts to caller for privilege checks. */
+        if ( pg_owner == dom_io )
+            pg_owner = curr->domain;
+
+        if ( !iomem_access_permitted(pg_owner, mfn, mfn) )
+        {
+            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
+            {
+                gdprintk(XENLOG_WARNING,
+                         "d%d non-privileged attempt to map MMIO space 
%"PRI_mfn"\n",
+                         pg_owner->domain_id, mfn);
+                return -EPERM;
+            }
+            return -EINVAL;
+        }
+
+        if ( pg_owner != l1e_owner &&
+             !iomem_access_permitted(l1e_owner, mfn, mfn) )
+        {
+            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
+            {
+                gdprintk(XENLOG_WARNING,
+                         "d%d attempted to map MMIO space %"PRI_mfn" in d%d to 
d%d\n",
+                         curr->domain->domain_id, mfn, pg_owner->domain_id,
+                         l1e_owner->domain_id);
+                return -EPERM;
+            }
+            return -EINVAL;
+        }
+
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( !is_hardware_domain(l1e_owner) )
+                    break;
+                /* fallthrough */
+            case -1:
+                return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+#ifndef NDEBUG
+            const unsigned long *ro_map;
+            unsigned int seg, bdf;
+
+            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
+                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
+                  test_bit(bdf, ro_map)) )
+                printk(XENLOG_G_WARNING
+                       "d%d: Forcing read-only access to MFN %lx\n",
+                       l1e_owner->domain_id, mfn);
+            else
+                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
+                                       print_mmio_emul_range,
+                                       &(struct mmio_emul_range_ctxt){
+                                           .d = l1e_owner,
+                                           .mfn = mfn });
+#endif
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
+    }
+
+    if ( unlikely( (real_pg_owner != pg_owner) &&
+                   (real_pg_owner != dom_cow) ) )
+    {
+        /*
+         * Let privileged domains transfer the right to map their target
+         * domain's pages. This is used to allow stub-domain pvfb export to
+         * dom0, until pvfb supports granted mappings. At that time this
+         * minor hack can go away.
+         */
+        if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
+             xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n",
+                     pg_owner->domain_id, l1e_owner->domain_id,
+                     real_pg_owner ? real_pg_owner->domain_id : -1);
+            goto could_not_pin;
+        }
+        pg_owner = real_pg_owner;
+    }
+
+    /* Extra paranoid check for shared memory. Writable mappings
+     * disallowed (unshare first!) */
+    if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) )
+        goto could_not_pin;
+
+    /* Foreign mappings into guests in shadow external mode don't
+     * contribute to writeable mapping refcounts.  (This allows the
+     * qemu-dm helper process in dom0 to map the domain's memory without
+     * messing up the count of "real" writable mappings.) */
+    write = (l1f & _PAGE_RW) &&
+            ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner));
+    if ( write && !get_page_type(page, PGT_writable_page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page type 
PGT_writable_page\n");
+        goto could_not_pin;
+    }
+
+    if ( pte_flags_to_cacheattr(l1f) !=
+         ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) )
+    {
+        unsigned long x, nx, y = page->count_info;
+        unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
+        int err;
+
+        if ( is_xen_heap_page(page) )
+        {
+            if ( write )
+                put_page_type(page);
+            put_page(page);
+            gdprintk(XENLOG_WARNING,
+                     "Attempt to change cache attributes of Xen heap page\n");
+            return -EACCES;
+        }
+
+        do {
+            x  = y;
+            nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base);
+        } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
+
+        err = update_xen_mappings(mfn, cacheattr);
+        if ( unlikely(err) )
+        {
+            cacheattr = y & PGC_cacheattr_mask;
+            do {
+                x  = y;
+                nx = (x & ~PGC_cacheattr_mask) | cacheattr;
+            } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
+
+            if ( write )
+                put_page_type(page);
+            put_page(page);
+
+            gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" 
PRI_mfn
+                     " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for 
d%d\n",
+                     mfn, get_gpfn_from_mfn(mfn),
+                     l1e_get_intpte(l1e), l1e_owner->domain_id);
+            return err;
+        }
+    }
+
+    return 0;
+
+ could_not_pin:
+    gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn
+             ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
+             mfn, get_gpfn_from_mfn(mfn),
+             l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);
+    if ( real_pg_owner != NULL )
+        put_page(page);
+    return -EBUSY;
+}
+
+
+/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
+define_get_linear_pagetable(l2);
+static int
+get_page_from_l2e(
+    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+{
+    unsigned long mfn = l2e_get_pfn(l2e);
+    int rc;
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
+    {
+        rc = get_page_and_type_from_pagenr(mfn, PGT_l1_page_table, d, 0, 0);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+        return rc;
+    }
+
+    if ( !opt_allow_superpage )
+    {
+        gdprintk(XENLOG_WARNING, "PV superpages disabled in hypervisor\n");
+        return -EINVAL;
+    }
+
+    if ( mfn & (L1_PAGETABLE_ENTRIES-1) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Unaligned superpage map attempt mfn %" PRI_mfn "\n", mfn);
+        return -EINVAL;
+    }
+
+    return get_superpage(mfn, d);
+}
+
+
+define_get_linear_pagetable(l3);
+static int
+get_page_from_l3e(
+    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
+{
+    int rc;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                 l3e_get_flags(l3e) & l3_disallow_mask(d));
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_pagenr(
+        l3e_get_pfn(l3e), PGT_l2_page_table, d, partial, 1);
+    if ( unlikely(rc == -EINVAL) &&
+         !is_pv_32bit_domain(d) &&
+         get_l3_linear_pagetable(l3e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+define_get_linear_pagetable(l4);
+static int
+get_page_from_l4e(
+    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
+{
+    int rc;
+
+    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_pagenr(
+        l4e_get_pfn(l4e), PGT_l3_page_table, d, partial, 1);
+    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+#define adjust_guest_l1e(pl1e, d)                                            \
+    do {                                                                     \
+        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
+             likely(!is_pv_32bit_domain(d)) )                                \
+        {                                                                    \
+            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
+            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
+                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
+                gdprintk(XENLOG_WARNING,                                     \
+                         "Global bit is set to kernel page %lx\n",           \
+                         l1e_get_pfn((pl1e)));                               \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
+                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
+                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
+        }                                                                    \
+    } while ( 0 )
+
+#define adjust_guest_l2e(pl2e, d)                               \
+    do {                                                        \
+        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l2e_add_flags((pl2e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define adjust_guest_l3e(pl3e, d)                                   \
+    do {                                                            \
+        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
+            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
+                                         _PAGE_USER :               \
+                                         _PAGE_USER|_PAGE_RW);      \
+    } while ( 0 )
+
+#define adjust_guest_l4e(pl4e, d)                               \
+    do {                                                        \
+        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l4e_add_flags((pl4e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define unadjust_guest_l3e(pl3e, d)                                         \
+    do {                                                                    \
+        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
+             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
+            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
+    } while ( 0 )
+
+void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
+{
+    unsigned long     pfn = l1e_get_pfn(l1e);
+    struct page_info *page;
+    struct domain    *pg_owner;
+    struct vcpu      *v;
+
+    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) )
+        return;
+
+    page = mfn_to_page(pfn);
+    pg_owner = page_get_owner(page);
+
+    /*
+     * Check if this is a mapping that was established via a grant reference.
+     * If it was then we should not be here: we require that such mappings are
+     * explicitly destroyed via the grant-table interface.
+     *
+     * The upshot of this is that the guest can end up with active grants that
+     * it cannot destroy (because it no longer has a PTE to present to the
+     * grant-table interface). This can lead to subtle hard-to-catch bugs,
+     * hence a special grant PTE flag can be enabled to catch the bug early.
+     *
+     * (Note that the undestroyable active grants are not a security hole in
+     * Xen. All active grants can safely be cleaned up when the domain dies.)
+     */
+    if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
+         !l1e_owner->is_shutting_down && !l1e_owner->is_dying )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
+                 l1e_get_intpte(l1e));
+        domain_crash(l1e_owner);
+    }
+
+    /* Remember we didn't take a type-count of foreign writable mappings
+     * to paging-external domains */
+    if ( (l1e_get_flags(l1e) & _PAGE_RW) &&
+         ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
+    {
+        put_page_and_type(page);
+    }
+    else
+    {
+        /* We expect this is rare so we blow the entire shadow LDT. */
+        if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) ==
+                       PGT_seg_desc_page)) &&
+             unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) &&
+             (l1e_owner == pg_owner) )
+        {
+            for_each_vcpu ( pg_owner, v )
+                invalidate_shadow_ldt(v, 1);
+        }
+        put_page(page);
+    }
+}
+
+static void put_superpage(unsigned long mfn);
+/*
+ * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
+ * Note also that this automatically deals correctly with linear p.t.'s.
+ */
+static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+{
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
+        return 1;
+
+    if ( l2e_get_flags(l2e) & _PAGE_PSE )
+        put_superpage(l2e_get_pfn(l2e));
+    else
+        put_page_and_type(l2e_get_page(l2e));
+
+    return 0;
+}
+
+static void put_data_page(
+    struct page_info *page, int writeable)
+{
+    if ( writeable )
+        put_page_and_type(page);
+    else
+        put_page(page);
+}
+
+extern int __put_page_type(struct page_info *, int preemptible);
+
+static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                             int partial, bool_t defer)
+{
+    struct page_info *pg;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
+        return 1;
+
+    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
+    {
+        unsigned long mfn = l3e_get_pfn(l3e);
+        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
+
+        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
+        do {
+            put_data_page(mfn_to_page(mfn), writeable);
+        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
+
+        return 0;
+    }
+
+    pg = l3e_get_page(l3e);
+
+    if ( unlikely(partial > 0) )
+    {
+        ASSERT(!defer);
+        return __put_page_type(pg, 1);
+    }
+
+    if ( defer )
+    {
+        current->arch.old_guest_table = pg;
+        return 0;
+    }
+
+    return put_page_and_type_preemptible(pg);
+}
+
+static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                             int partial, bool_t defer)
+{
+    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
+         (l4e_get_pfn(l4e) != pfn) )
+    {
+        struct page_info *pg = l4e_get_page(l4e);
+
+        if ( unlikely(partial > 0) )
+        {
+            ASSERT(!defer);
+            return __put_page_type(pg, 1);
+        }
+
+        if ( defer )
+        {
+            current->arch.old_guest_table = pg;
+            return 0;
+        }
+
+        return put_page_and_type_preemptible(pg);
+    }
+    return 1;
+}
+
+static int alloc_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l1_pgentry_t  *pl1e;
+    unsigned int   i;
+    int            ret = 0;
+
+    pl1e = map_domain_page(_mfn(pfn));
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+    {
+        if ( is_guest_l1_slot(i) )
+            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
+            {
+            default:
+                goto fail;
+            case 0:
+                break;
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
+                break;
+            }
+
+        adjust_guest_l1e(pl1e[i], d);
+    }
+
+    unmap_domain_page(pl1e);
+    return 0;
+
+ fail:
+    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
+    while ( i-- > 0 )
+        if ( is_guest_l1_slot(i) )
+            put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+    return ret;
+}
+
+static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+{
+    struct page_info *page;
+    l3_pgentry_t     l3e3;
+
+    if ( !is_pv_32bit_domain(d) )
+        return 1;
+
+    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
+
+    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
+    l3e3 = pl3e[3];
+    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
+        return 0;
+    }
+
+    /*
+     * The Xen-private mappings include linear mappings. The L2 thus cannot
+     * be shared by multiple L3 tables. The test here is adequate because:
+     *  1. Cannot appear in slots != 3 because get_page_type() checks the
+     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
+     *  2. Cannot appear in another page table's L3:
+     *     a. alloc_l3_table() calls this function and this check will fail
+     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
+     */
+    page = l3e_get_page(l3e3);
+    BUG_ON(page->u.inuse.type_info & PGT_pinned);
+    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
+    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
+    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
+        return 0;
+    }
+
+    return 1;
+}
+
+static int alloc_l2_table(struct page_info *page, unsigned long type,
+                          int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l2_pgentry_t  *pl2e;
+    unsigned int   i;
+    int            rc = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        if ( preemptible && i > page->nr_validated_ptes
+             && hypercall_preempt_check() )
+        {
+            page->nr_validated_ptes = i;
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( !is_guest_l2_slot(d, type, i) ||
+             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
+            continue;
+
+        if ( rc < 0 )
+        {
+            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", 
i);
+            while ( i-- > 0 )
+                if ( is_guest_l2_slot(d, type, i) )
+                    put_page_from_l2e(pl2e[i], pfn);
+            break;
+        }
+
+        adjust_guest_l2e(pl2e[i], d);
+    }
+
+    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
+    {
+        /* Xen private mappings. */
+        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
+               &compat_idle_pg_table_l2[
+                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
+               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
+    }
+
+    unmap_domain_page(pl2e);
+    return rc > 0 ? 0 : rc;
+}
+
+static int alloc_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l3_pgentry_t  *pl3e;
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    /*
+     * PAE guests allocate full pages, but aren't required to initialize
+     * more than the first four entries; when running in compatibility
+     * mode, however, the full page is visible to the MMU, and hence all
+     * 512 entries must be valid/verified, which is most easily achieved
+     * by clearing them out.
+     */
+    if ( is_pv_32bit_domain(d) )
+        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
+
+    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( is_pv_32bit_domain(d) && (i == 3) )
+        {
+            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
+                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
+                rc = -EINVAL;
+            else
+                rc = get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]),
+                                                   PGT_l2_page_table |
+                                                   PGT_pae_xen_l2,
+                                                   d, partial, 1);
+        }
+        else if ( !is_guest_l3_slot(i) ||
+                  (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc == -EINTR && i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            rc = -ERESTART;
+        }
+        if ( rc < 0 )
+            break;
+
+        adjust_guest_l3e(pl3e[i], d);
+    }
+
+    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
+        rc = -EINVAL;
+    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
+    {
+        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
+        if ( i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            current->arch.old_guest_table = page;
+        }
+        while ( i-- > 0 )
+        {
+            if ( !is_guest_l3_slot(i) )
+                continue;
+            unadjust_guest_l3e(pl3e[i], d);
+        }
+    }
+
+    unmap_domain_page(pl3e);
+    return rc > 0 ? 0 : rc;
+}
+
+#ifndef NDEBUG
+static unsigned int __read_mostly root_pgt_pv_xen_slots
+    = ROOT_PAGETABLE_PV_XEN_SLOTS;
+static l4_pgentry_t __read_mostly split_l4e;
+#else
+#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
+#endif
+
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                         bool_t zap_ro_mpt)
+{
+    /* Xen private mappings. */
+    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
+#ifndef NDEBUG
+    if ( l4e_get_intpte(split_l4e) )
+        l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] =
+            split_l4e;
+#endif
+    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
+        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
+    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
+        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
+    if ( zap_ro_mpt || is_pv_32bit_domain(d) || paging_mode_refcounts(d) )
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+}
+
+static int alloc_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( !is_guest_l4_slot(d, i) ||
+             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc < 0 )
+        {
+            if ( rc != -EINTR )
+                gdprintk(XENLOG_WARNING,
+                         "Failure in alloc_l4_table: slot %#x\n", i);
+            if ( i )
+            {
+                page->nr_validated_ptes = i;
+                page->partial_pte = 0;
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else
+                {
+                    if ( current->arch.old_guest_table )
+                        page->nr_validated_ptes++;
+                    current->arch.old_guest_table = page;
+                }
+            }
+        }
+        if ( rc < 0 )
+        {
+            unmap_domain_page(pl4e);
+            return rc;
+        }
+
+        adjust_guest_l4e(pl4e[i], d);
+    }
+
+    if ( rc >= 0 )
+    {
+        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
+        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+    unmap_domain_page(pl4e);
+
+    return rc;
+}
+
+static void free_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l1_pgentry_t *pl1e;
+    unsigned int  i;
+
+    pl1e = map_domain_page(_mfn(pfn));
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+        if ( is_guest_l1_slot(i) )
+            put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+}
+
+static int free_l2_table(struct page_info *page, int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l2_pgentry_t *pl2e;
+    unsigned int  i = page->nr_validated_ptes - 1;
+    int err = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    ASSERT(page->nr_validated_ptes);
+    do {
+        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
+             put_page_from_l2e(pl2e[i], pfn) == 0 &&
+             preemptible && i && hypercall_preempt_check() )
+        {
+           page->nr_validated_ptes = i;
+           err = -ERESTART;
+        }
+    } while ( !err && i-- );
+
+    unmap_domain_page(pl2e);
+
+    if ( !err )
+        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
+
+    return err;
+}
+
+static int free_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l3_pgentry_t *pl3e;
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    do {
+        if ( is_guest_l3_slot(i) )
+        {
+            rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
+            if ( rc < 0 )
+                break;
+            partial = 0;
+            if ( rc > 0 )
+                continue;
+            unadjust_guest_l3e(pl3e[i], d);
+        }
+    } while ( i-- );
+
+    unmap_domain_page(pl3e);
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+    return rc > 0 ? 0 : rc;
+}
+
+static int free_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    do {
+        if ( is_guest_l4_slot(d, i) )
+            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
+        if ( rc < 0 )
+            break;
+        partial = 0;
+    } while ( i-- );
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+
+    unmap_domain_page(pl4e);
+
+    if ( rc >= 0 )
+    {
+        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+
+    return rc;
+}
+
+
+/* How to write an entry to the guest pagetables.
+ * Returns 0 for failure (pointer not valid), 1 for success. */
+static inline int update_intpte(intpte_t *p,
+                                intpte_t old,
+                                intpte_t new,
+                                unsigned long mfn,
+                                struct vcpu *v,
+                                int preserve_ad)
+{
+    int rv = 1;
+#ifndef PTE_UPDATE_WITH_CMPXCHG
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
+#endif
+    {
+        intpte_t t = old;
+        for ( ; ; )
+        {
+            intpte_t _new = new;
+            if ( preserve_ad )
+                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
+            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
+            if ( unlikely(rv == 0) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Failed to update %" PRIpte " -> %" PRIpte
+                         ": saw %" PRIpte "\n", old, _new, t);
+                break;
+            }
+
+            if ( t == old )
+                break;
+
+            /* Allowed to change in Accessed/Dirty flags only. */
+            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+            old = t;
+        }
+    }
+    return rv;
+}
+
+/* Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu */
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
+    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
+                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
+                  (_m), (_v), (_ad))
+
+/*
+ * PTE flags that a guest may change without re-validating the PTE.
+ * All other bits affect translation, caching, or Xen's safety.
+ */
+#define FASTPATH_FLAG_WHITELIST                                     \
+    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
+     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
+                        unsigned long gl1mfn, int preserve_ad,
+                        struct vcpu *pt_vcpu, struct domain *pg_dom)
+{
+    l1_pgentry_t ol1e;
+    struct domain *pt_dom = pt_vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
+        return -EFAULT;
+
+    if ( unlikely(paging_mode_refcounts(pt_dom)) )
+    {
+        if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
+            return 0;
+        return -EBUSY;
+    }
+
+    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+    {
+        /* Translate foreign guest addresses. */
+        struct page_info *page = NULL;
+
+        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
+            return -EINVAL;
+        }
+
+        if ( paging_mode_translate(pg_dom) )
+        {
+            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, 
P2M_ALLOC);
+            if ( !page )
+                return -EINVAL;
+            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l1e(nl1e, pt_dom);
+            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                              preserve_ad);
+            if ( page )
+                put_page(page);
+            return rc ? 0 : -EBUSY;
+        }
+
+        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        default:
+            if ( page )
+                put_page(page);
+            return rc;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
+            rc = 0;
+            break;
+        }
+        if ( page )
+            put_page(page);
+
+        adjust_guest_l1e(nl1e, pt_dom);
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                    preserve_ad)) )
+        {
+            ol1e = nl1e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l1e(ol1e, pt_dom);
+    return rc;
+}
+
+
+/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
+static int mod_l2_entry(l2_pgentry_t *pl2e,
+                        l2_pgentry_t nl2e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    l2_pgentry_t ol2e;
+    struct domain *d = vcpu->domain;
+    struct page_info *l2pg = mfn_to_page(pfn);
+    unsigned long type = l2pg->u.inuse.type_info;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl2e));
+        return -EPERM;
+    }
+
+    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
+        return -EFAULT;
+
+    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l2e(nl2e, d);
+            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
+                return 0;
+            return -EBUSY;
+        }
+
+        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
+            return rc;
+
+        adjust_guest_l2e(nl2e, d);
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol2e = nl2e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l2e(ol2e, pfn);
+    return rc;
+}
+
+/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
+static int mod_l3_entry(l3_pgentry_t *pl3e,
+                        l3_pgentry_t nl3e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    l3_pgentry_t ol3e;
+    struct domain *d = vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl3e));
+        return -EINVAL;
+    }
+
+    /*
+     * Disallow updates to final L3 slot. It contains Xen mappings, and it
+     * would be a pain to ensure they remain continuously valid throughout.
+     */
+    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
+        return -EINVAL;
+
+    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
+        return -EFAULT;
+
+    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l3e(nl3e, d);
+            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l3e(nl3e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l3e(nl3e, d);
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol3e = nl3e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    if ( likely(rc == 0) )
+        if ( !create_pae_xen_mappings(d, pl3e) )
+            BUG();
+
+    put_page_from_l3e(ol3e, pfn, 0, 1);
+    return rc;
+}
+
+/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
+static int mod_l4_entry(l4_pgentry_t *pl4e,
+                        l4_pgentry_t nl4e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    struct domain *d = vcpu->domain;
+    l4_pgentry_t ol4e;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl4e));
+        return -EINVAL;
+    }
+
+    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
+        return -EFAULT;
+
+    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l4e(nl4e, d);
+            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l4e(nl4e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l4e(nl4e, d);
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol4e = nl4e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    put_page_from_l4e(ol4e, pfn, 0, 1);
+    return rc;
+}
+
+
+int alloc_page_type(struct page_info *page, unsigned long type,
+                    int preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    int rc;
+
+    /* A page table is dirtied when its type count becomes non-zero. */
+    if ( likely(owner != NULL) )
+        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        rc = alloc_l1_table(page);
+        break;
+    case PGT_l2_page_table:
+        rc = alloc_l2_table(page, type, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l4_table(page);
+        break;
+    case PGT_seg_desc_page:
+        rc = alloc_segdesc_page(page);
+        break;
+    default:
+        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n",
+               type, page->u.inuse.type_info,
+               page->count_info);
+        rc = -EINVAL;
+        BUG();
+    }
+
+    /* No need for atomic update of type_info here: noone else updates it. */
+    wmb();
+    switch ( rc )
+    {
+    case 0:
+        page->u.inuse.type_info |= PGT_validated;
+        break;
+    case -EINTR:
+        ASSERT((page->u.inuse.type_info &
+                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
+        page->u.inuse.type_info &= ~PGT_count_mask;
+        break;
+    default:
+        ASSERT(rc < 0);
+        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
+                 " (pfn %" PRI_pfn ") for type %" PRtype_info
+                 ": caf=%08lx taf=%" PRtype_info "\n",
+                 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
+                 type, page->count_info, page->u.inuse.type_info);
+        if ( page != current->arch.old_guest_table )
+            page->u.inuse.type_info = 0;
+        else
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_count_mask | PGT_validated)) == 1);
+    case -ERESTART:
+            get_page_light(page);
+            page->u.inuse.type_info |= PGT_partial;
+        }
+        break;
+    }
+
+    return rc;
+}
+
+int free_page_type(struct page_info *page, unsigned long type,
+                   int preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    unsigned long gmfn;
+    int rc;
+
+    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
+    {
+        /* A page table is dirtied when its type count becomes zero. */
+        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
+
+        if ( shadow_mode_refcounts(owner) )
+            return 0;
+
+        gmfn = mfn_to_gmfn(owner, page_to_mfn(page));
+        ASSERT(VALID_M2P(gmfn));
+        /* Page sharing not supported for shadowed domains */
+        if(!SHARED_M2P(gmfn))
+            shadow_remove_all_shadows(owner, _mfn(gmfn));
+    }
+
+    if ( !(type & PGT_partial) )
+    {
+        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
+        page->partial_pte = 0;
+    }
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        free_l1_table(page);
+        rc = 0;
+        break;
+    case PGT_l2_page_table:
+        rc = free_l2_table(page, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = free_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = free_l4_table(page);
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
+                 type, page_to_mfn(page));
+        rc = -EINVAL;
+        BUG();
+    }
+
+    return rc;
+}
+
+static int get_spage_pages(struct page_info *page, struct domain *d)
+{
+    int i;
+
+    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
+    {
+        if (!get_page_and_type(page, d, PGT_writable_page))
+        {
+            while (--i >= 0)
+                put_page_and_type(--page);
+            return 0;
+        }
+    }
+    return 1;
+}
+
+static void put_spage_pages(struct page_info *page)
+{
+    int i;
+
+    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
+    {
+        put_page_and_type(page);
+    }
+    return;
+}
+
+static int mark_superpage(struct spage_info *spage, struct domain *d)
+{
+    unsigned long x, nx, y = spage->type_info;
+    int pages_done = 0;
+
+    ASSERT(opt_allow_superpage);
+
+    do {
+        x = y;
+        nx = x + 1;
+        if ( (x & SGT_type_mask) == SGT_mark )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Duplicate superpage mark attempt mfn %" PRI_mfn "\n",
+                     spage_to_mfn(spage));
+            if ( pages_done )
+                put_spage_pages(spage_to_page(spage));
+            return -EINVAL;
+        }
+        if ( (x & SGT_type_mask) == SGT_dynamic )
+        {
+            if ( pages_done )
+            {
+                put_spage_pages(spage_to_page(spage));
+                pages_done = 0;
+            }
+        }
+        else if ( !pages_done )
+        {
+            if ( !get_spage_pages(spage_to_page(spage), d) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Superpage type conflict in mark attempt mfn %" 
PRI_mfn "\n",
+                         spage_to_mfn(spage));
+                return -EINVAL;
+            }
+            pages_done = 1;
+        }
+        nx = (nx & ~SGT_type_mask) | SGT_mark;
+
+    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
+
+    return 0;
+}
+
+static int unmark_superpage(struct spage_info *spage)
+{
+    unsigned long x, nx, y = spage->type_info;
+    unsigned long do_pages = 0;
+
+    ASSERT(opt_allow_superpage);
+
+    do {
+        x = y;
+        nx = x - 1;
+        if ( (x & SGT_type_mask) != SGT_mark )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Attempt to unmark unmarked superpage mfn %" PRI_mfn "\n",
+                     spage_to_mfn(spage));
+            return -EINVAL;
+        }
+        if ( (nx & SGT_count_mask) == 0 )
+        {
+            nx = (nx & ~SGT_type_mask) | SGT_none;
+            do_pages = 1;
+        }
+        else
+        {
+            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
+        }
+    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
+
+    if ( do_pages )
+        put_spage_pages(spage_to_page(spage));
+
+    return 0;
+}
+
+void clear_superpage_mark(struct page_info *page)
+{
+    struct spage_info *spage;
+
+    if ( !opt_allow_superpage )
+        return;
+
+    spage = page_to_spage(page);
+    if ((spage->type_info & SGT_type_mask) == SGT_mark)
+        unmark_superpage(spage);
+
+}
+
+int get_superpage(unsigned long mfn, struct domain *d)
+{
+    struct spage_info *spage;
+    unsigned long x, nx, y;
+    int pages_done = 0;
+
+    ASSERT(opt_allow_superpage);
+
+    if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
+        return -EINVAL;
+
+    spage = mfn_to_spage(mfn);
+    y = spage->type_info;
+    do {
+        x = y;
+        nx = x + 1;
+        if ( (x & SGT_type_mask) != SGT_none )
+        {
+            if ( pages_done )
+            {
+                put_spage_pages(spage_to_page(spage));
+                pages_done = 0;
+            }
+        }
+        else
+        {
+            if ( !get_spage_pages(spage_to_page(spage), d) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Type conflict on superpage mapping mfn %" PRI_mfn 
"\n",
+                         spage_to_mfn(spage));
+                return -EINVAL;
+            }
+            pages_done = 1;
+            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
+        }
+    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
+
+    return 0;
+}
+
+static void put_superpage(unsigned long mfn)
+{
+    struct spage_info *spage;
+    unsigned long x, nx, y;
+    unsigned long do_pages = 0;
+
+    if ( !opt_allow_superpage )
+    {
+        put_spage_pages(mfn_to_page(mfn));
+        return;
+    }
+
+    spage = mfn_to_spage(mfn);
+    y = spage->type_info;
+    do {
+        x = y;
+        nx = x - 1;
+        if ((x & SGT_type_mask) == SGT_dynamic)
+        {
+            if ((nx & SGT_count_mask) == 0)
+            {
+                nx = (nx & ~SGT_type_mask) | SGT_none;
+                do_pages = 1;
+            }
+        }
+
+    } while ((y = cmpxchg(&spage->type_info, x, nx)) != x);
+
+    if (do_pages)
+        put_spage_pages(spage_to_page(spage));
+
+    return;
+}
+
+int put_old_guest_table(struct vcpu *v)
+{
+    int rc;
+
+    if ( !v->arch.old_guest_table )
+        return 0;
+
+    switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) )
+    {
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    }
+
+    v->arch.old_guest_table = NULL;
+
+    return rc;
+}
+
+int new_guest_cr3(unsigned long mfn)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    int rc;
+    unsigned long old_base_mfn;
+
+    if ( is_pv_32bit_domain(d) )
+    {
+        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
+        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
+
+        rc = paging_mode_refcounts(d)
+             ? -EINVAL /* Old code was broken, but what should it be? */
+             : mod_l4_entry(
+                    pl4e,
+                    l4e_from_pfn(
+                        mfn,
+                        (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
+                    gt_mfn, 0, curr);
+        unmap_domain_page(pl4e);
+        switch ( rc )
+        {
+        case 0:
+            break;
+        case -EINTR:
+        case -ERESTART:
+            return -ERESTART;
+        default:
+            gdprintk(XENLOG_WARNING,
+                     "Error while installing new compat baseptr %" PRI_mfn 
"\n",
+                     mfn);
+            return rc;
+        }
+
+        invalidate_shadow_ldt(curr, 0);
+        write_ptbase(curr);
+
+        return 0;
+    }
+
+    rc = put_old_guest_table(curr);
+    if ( unlikely(rc) )
+        return rc;
+
+    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
+    /*
+     * This is particularly important when getting restarted after the
+     * previous attempt got preempted in the put-old-MFN phase.
+     */
+    if ( old_base_mfn == mfn )
+    {
+        write_ptbase(curr);
+        return 0;
+    }
+
+    rc = paging_mode_refcounts(d)
+         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
+         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
+    switch ( rc )
+    {
+    case 0:
+        break;
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    default:
+        gdprintk(XENLOG_WARNING,
+                 "Error while installing new baseptr %" PRI_mfn "\n", mfn);
+        return rc;
+    }
+
+    invalidate_shadow_ldt(curr, 0);
+
+    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+        fill_ro_mpt(mfn);
+    curr->arch.guest_table = pagetable_from_pfn(mfn);
+    update_cr3(curr);
+
+    write_ptbase(curr);
+
+    if ( likely(old_base_mfn != 0) )
+    {
+        struct page_info *page = mfn_to_page(old_base_mfn);
+
+        if ( paging_mode_refcounts(d) )
+            put_page(page);
+        else
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+                rc = -ERESTART;
+                /* fallthrough */
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+    }
+
+    return rc;
+}
+
+static struct domain *get_pg_owner(domid_t domid)
+{
+    struct domain *pg_owner = NULL, *curr = current->domain;
+
+    if ( likely(domid == DOMID_SELF) )
+    {
+        pg_owner = rcu_lock_current_domain();
+        goto out;
+    }
+
+    if ( unlikely(domid == curr->domain_id) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
+        goto out;
+    }
+
+    if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Cannot mix foreign mappings with translated domains\n");
+        goto out;
+    }
+
+    switch ( domid )
+    {
+    case DOMID_IO:
+        pg_owner = rcu_lock_domain(dom_io);
+        break;
+    case DOMID_XEN:
+        pg_owner = rcu_lock_domain(dom_xen);
+        break;
+    default:
+        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
+        {
+            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
+            break;
+        }
+        break;
+    }
+
+ out:
+    return pg_owner;
+}
+
+static void put_pg_owner(struct domain *pg_owner)
+{
+    rcu_unlock_domain(pg_owner);
+}
+
+static inline int vcpumask_to_pcpumask(
+    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t 
*pmask)
+{
+    unsigned int vcpu_id, vcpu_bias, offs;
+    unsigned long vmask;
+    struct vcpu *v;
+    bool_t is_native = !is_pv_32bit_domain(d);
+
+    cpumask_clear(pmask);
+    for ( vmask = 0, offs = 0; ; ++offs)
+    {
+        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
+        if ( vcpu_bias >= d->max_vcpus )
+            return 0;
+
+        if ( unlikely(is_native ?
+                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
+                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
+                                             offs, 1)) )
+        {
+            cpumask_clear(pmask);
+            return -EFAULT;
+        }
+
+        while ( vmask )
+        {
+            vcpu_id = find_first_set_bit(vmask);
+            vmask &= ~(1UL << vcpu_id);
+            vcpu_id += vcpu_bias;
+            if ( (vcpu_id >= d->max_vcpus) )
+                return 0;
+            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
+                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
+        }
+    }
+}
+
+long do_mmuext_op(
+    XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
+    unsigned int count,
+    XEN_GUEST_HANDLE_PARAM(uint) pdone,
+    unsigned int foreigndom)
+{
+    struct mmuext_op op;
+    unsigned long type;
+    unsigned int i, done = 0;
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    struct domain *pg_owner;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(uops)) )
+    {
+        /* See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below. */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmuext_op);
+
+    if ( unlikely(!guest_handle_okay(uops, count)) )
+        return -EFAULT;
+
+    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
+        return -ESRCH;
+
+    if ( !is_pv_domain(pg_owner) )
+    {
+        put_pg_owner(pg_owner);
+        return -EINVAL;
+    }
+
+    rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner);
+    if ( rc )
+    {
+        put_pg_owner(pg_owner);
+        return rc;
+    }
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( is_hvm_domain(d) )
+        {
+            switch ( op.cmd )
+            {
+            case MMUEXT_PIN_L1_TABLE:
+            case MMUEXT_PIN_L2_TABLE:
+            case MMUEXT_PIN_L3_TABLE:
+            case MMUEXT_PIN_L4_TABLE:
+            case MMUEXT_UNPIN_TABLE:
+                break;
+            default:
+                rc = -EOPNOTSUPP;
+                goto done;
+            }
+        }
+
+        rc = 0;
+
+        switch ( op.cmd )
+        {
+        case MMUEXT_PIN_L1_TABLE:
+            type = PGT_l1_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L2_TABLE:
+            type = PGT_l2_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L3_TABLE:
+            type = PGT_l3_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L4_TABLE:
+            if ( is_pv_32bit_domain(pg_owner) )
+                break;
+            type = PGT_l4_page_table;
+
+        pin_page: {
+            struct page_info *page;
+
+            /* Ignore pinning of invalid paging levels. */
+            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
+                break;
+
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            rc = get_page_type_preemptible(page, type);
+            if ( unlikely(rc) )
+            {
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else if ( rc != -ERESTART )
+                    gdprintk(XENLOG_WARNING,
+                             "Error %d while pinning mfn %" PRI_mfn "\n",
+                            rc, page_to_mfn(page));
+                if ( page != curr->arch.old_guest_table )
+                    put_page(page);
+                break;
+            }
+
+            rc = xsm_memory_pin_page(XSM_HOOK, d, pg_owner, page);
+            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
+                                                  &page->u.inuse.type_info)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " already pinned\n", 
page_to_mfn(page));
+                rc = -EINVAL;
+            }
+
+            if ( unlikely(rc) )
+                goto pin_drop;
+
+            /* A page is dirtied when its pin status is set. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            /* We can race domain destruction (domain_relinquish_resources). */
+            if ( unlikely(pg_owner != d) )
+            {
+                int drop_ref;
+                spin_lock(&pg_owner->page_alloc_lock);
+                drop_ref = (pg_owner->is_dying &&
+                            test_and_clear_bit(_PGT_pinned,
+                                               &page->u.inuse.type_info));
+                spin_unlock(&pg_owner->page_alloc_lock);
+                if ( drop_ref )
+                {
+        pin_drop:
+                    if ( type == PGT_l1_page_table )
+                        put_page_and_type(page);
+                    else
+                        curr->arch.old_guest_table = page;
+                }
+            }
+
+            break;
+        }
+
+        case MMUEXT_UNPIN_TABLE: {
+            struct page_info *page;
+
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
+                         op.arg1.mfn, pg_owner->domain_id);
+                rc = -EINVAL;
+                break;
+            }
+
+            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
+            {
+                put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                rc = 0;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+            put_page(page);
+
+            /* A page is dirtied when its pin status is cleared. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            break;
+        }
+
+        case MMUEXT_NEW_BASEPTR:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(d)) )
+                rc = -EINVAL;
+            else
+                rc = new_guest_cr3(op.arg1.mfn);
+            break;
+
+        case MMUEXT_NEW_USER_BASEPTR: {
+            unsigned long old_mfn;
+
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(d)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
+            /*
+             * This is particularly important when getting restarted after the
+             * previous attempt got preempted in the put-old-MFN phase.
+             */
+            if ( old_mfn == op.arg1.mfn )
+                break;
+
+            if ( op.arg1.mfn != 0 )
+            {
+                if ( paging_mode_refcounts(d) )
+                    rc = get_page_from_pagenr(op.arg1.mfn, d) ? 0 : -EINVAL;
+                else
+                    rc = get_page_and_type_from_pagenr(
+                        op.arg1.mfn, PGT_root_page_table, d, 0, 1);
+
+                if ( unlikely(rc) )
+                {
+                    if ( rc == -EINTR )
+                        rc = -ERESTART;
+                    else if ( rc != -ERESTART )
+                        gdprintk(XENLOG_WARNING,
+                                 "Error %d installing new mfn %" PRI_mfn "\n",
+                                 rc, op.arg1.mfn);
+                    break;
+                }
+                if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+                    zap_ro_mpt(op.arg1.mfn);
+            }
+
+            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
+
+            if ( old_mfn != 0 )
+            {
+                struct page_info *page = mfn_to_page(old_mfn);
+
+                if ( paging_mode_refcounts(d) )
+                    put_page(page);
+                else
+                    switch ( rc = put_page_and_type_preemptible(page) )
+                    {
+                    case -EINTR:
+                        rc = -ERESTART;
+                        /* fallthrough */
+                    case -ERESTART:
+                        curr->arch.old_guest_table = page;
+                        break;
+                    default:
+                        BUG_ON(rc);
+                        break;
+                    }
+            }
+
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_LOCAL:
+            if ( likely(d == pg_owner) )
+                flush_tlb_local();
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_LOCAL:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else
+                paging_invlpg(curr, op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_TLB_FLUSH_MULTI:
+        case MMUEXT_INVLPG_MULTI:
+        {
+            cpumask_t *mask = this_cpu(scratch_cpumask);
+
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(vcpumask_to_pcpumask(d,
+                                   guest_handle_to_param(op.arg2.vcpumask,
+                                                         const_void),
+                                   mask)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
+                flush_tlb_mask(mask);
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_ALL:
+            if ( likely(d == pg_owner) )
+                flush_tlb_mask(d->domain_dirty_cpumask);
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_ALL:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(d->domain_dirty_cpumask, 
op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_FLUSH_CACHE:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(!cache_flush_permitted(d)) )
+                rc = -EACCES;
+            else
+                wbinvd();
+            break;
+
+        case MMUEXT_FLUSH_CACHE_GLOBAL:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( likely(cache_flush_permitted(d)) )
+            {
+                unsigned int cpu;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
+
+                cpumask_clear(mask);
+                for_each_online_cpu(cpu)
+                    if ( !cpumask_intersects(mask,
+                                             per_cpu(cpu_sibling_mask, cpu)) )
+                        __cpumask_set_cpu(cpu, mask);
+                flush_mask(mask, FLUSH_CACHE);
+            }
+            else
+                rc = -EINVAL;
+            break;
+
+        case MMUEXT_SET_LDT:
+        {
+            unsigned int ents = op.arg2.nr_ents;
+            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
+
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( paging_mode_external(d) )
+                rc = -EINVAL;
+            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
+                      (ents > 8192) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
+                rc = -EINVAL;
+            }
+            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
+                      (curr->arch.pv_vcpu.ldt_base != ptr) )
+            {
+                invalidate_shadow_ldt(curr, 0);
+                flush_tlb_local();
+                curr->arch.pv_vcpu.ldt_base = ptr;
+                curr->arch.pv_vcpu.ldt_ents = ents;
+                load_LDT(curr);
+            }
+            break;
+        }
+
+        case MMUEXT_CLEAR_PAGE: {
+            struct page_info *page;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( !page || !get_page_type(page, PGT_writable_page) )
+            {
+                if ( page )
+                    put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            /* A page is dirtied when it's being cleared. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            clear_domain_page(_mfn(page_to_mfn(page)));
+
+            put_page_and_type(page);
+            break;
+        }
+
+        case MMUEXT_COPY_PAGE:
+        {
+            struct page_info *src_page, *dst_page;
+
+            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, NULL,
+                                         P2M_ALLOC);
+            if ( unlikely(!src_page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Error copying from mfn %" PRI_mfn "\n",
+                         op.arg2.src_mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL,
+                                         P2M_ALLOC);
+            rc = (dst_page &&
+                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
+            if ( unlikely(rc) )
+            {
+                put_page(src_page);
+                if ( dst_page )
+                    put_page(dst_page);
+                gdprintk(XENLOG_WARNING,
+                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
+                break;
+            }
+
+            /* A page is dirtied when it's being copied to. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page)));
+
+            copy_domain_page(_mfn(page_to_mfn(dst_page)),
+                             _mfn(page_to_mfn(src_page)));
+
+            put_page_and_type(dst_page);
+            put_page(src_page);
+            break;
+        }
+
+        case MMUEXT_MARK_SUPER:
+        case MMUEXT_UNMARK_SUPER:
+        {
+            unsigned long mfn = op.arg1.mfn;
+
+            if ( !opt_allow_superpage )
+                rc = -EOPNOTSUPP;
+            else if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( mfn & (L1_PAGETABLE_ENTRIES - 1) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Unaligned superpage mfn %" PRI_mfn "\n", mfn);
+                rc = -EINVAL;
+            }
+            else if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
+                rc = -EINVAL;
+            else if ( op.cmd == MMUEXT_MARK_SUPER )
+                rc = mark_superpage(mfn_to_spage(mfn), d);
+            else
+                rc = unmark_superpage(mfn_to_spage(mfn));
+            break;
+        }
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+ done:
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(uops, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmuext_op, "hihi",
+            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmuext_op, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    perfc_add(num_mmuext_ops, i);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+long do_mmu_update(
+    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
+    unsigned int count,
+    XEN_GUEST_HANDLE_PARAM(uint) pdone,
+    unsigned int foreigndom)
+{
+    struct mmu_update req;
+    void *va;
+    unsigned long gpfn, gmfn, mfn;
+    struct page_info *page;
+    unsigned int cmd, i = 0, done = 0, pt_dom;
+    struct vcpu *curr = current, *v = curr;
+    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
+    struct domain_mmap_cache mapcache;
+    uint32_t xsm_needed = 0;
+    uint32_t xsm_checked = 0;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(ureqs)) )
+    {
+        /* See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below. */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmu_update);
+
+    if ( unlikely(!guest_handle_okay(ureqs, count)) )
+        return -EFAULT;
+
+    if ( (pt_dom = foreigndom >> 16) != 0 )
+    {
+        /* Pagetables belong to a foreign domain (PFD). */
+        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
+            return -ESRCH;
+
+        if ( pt_owner == d )
+            rcu_unlock_domain(pt_owner);
+        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+    }
+
+    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
+    {
+        rc = -ESRCH;
+        goto out;
+    }
+
+    domain_mmap_cache_init(&mapcache);
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
+
+        switch ( cmd )
+        {
+            /*
+             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+             * current A/D bits.
+             */
+        case MMU_NORMAL_PT_UPDATE:
+        case MMU_PT_UPDATE_PRESERVE_AD:
+        {
+            p2m_type_t p2mt;
+
+            rc = -EOPNOTSUPP;
+            if ( unlikely(paging_mode_refcounts(pt_owner)) )
+                break;
+
+            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
+            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
+            {
+                xsm_needed |= XSM_MMU_UPDATE_READ;
+                if ( get_pte_flags(req.val) & _PAGE_RW )
+                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
+            }
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, 
xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+            rc = -EINVAL;
+
+            req.ptr -= cmd;
+            gmfn = req.ptr >> PAGE_SHIFT;
+            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
+
+            if ( p2m_is_paged(p2mt) )
+            {
+                ASSERT(!page);
+                p2m_mem_paging_populate(pg_owner, gmfn);
+                rc = -ENOENT;
+                break;
+            }
+
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for normal update\n");
+                break;
+            }
+
+            mfn = page_to_mfn(page);
+            va = map_domain_page_with_cache(mfn, &mapcache);
+            va = (void *)((unsigned long)va +
+                          (unsigned long)(req.ptr & ~PAGE_MASK));
+
+            if ( page_lock(page) )
+            {
+                switch ( page->u.inuse.type_info & PGT_type_mask )
+                {
+                case PGT_l1_page_table:
+                {
+                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
+                    p2m_type_t l1e_p2mt = p2m_ram_rw;
+                    struct page_info *target = NULL;
+                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
+                                        P2M_UNSHARE : P2M_ALLOC;
+
+                    if ( paging_mode_translate(pg_owner) )
+                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
+                                                   &l1e_p2mt, q);
+
+                    if ( p2m_is_paged(l1e_p2mt) )
+                    {
+                        if ( target )
+                            put_page(target);
+                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
+                        rc = -ENOENT;
+                        break;
+                    }
+                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
+                    {
+                        rc = -ENOENT;
+                        break;
+                    }
+                    /* If we tried to unshare and failed */
+                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
+                    {
+                        /* We could not have obtained a page ref. */
+                        ASSERT(target == NULL);
+                        /* And mem_sharing_notify has already been called. */
+                        rc = -ENOMEM;
+                        break;
+                    }
+
+                    rc = mod_l1_entry(va, l1e, mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
+                                      pg_owner);
+                    if ( target )
+                        put_page(target);
+                }
+                break;
+                case PGT_l2_page_table:
+                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l3_page_table:
+                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l4_page_table:
+                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                break;
+                case PGT_writable_page:
+                    perfc_incr(writable_mmu_updates);
+                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                        rc = 0;
+                    break;
+                }
+                page_unlock(page);
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+            }
+            else if ( get_page_type(page, PGT_writable_page) )
+            {
+                perfc_incr(writable_mmu_updates);
+                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                    rc = 0;
+                put_page_type(page);
+            }
+
+            unmap_domain_page_with_cache(va, &mapcache);
+            put_page(page);
+        }
+        break;
+
+        case MMU_MACHPHYS_UPDATE:
+            if ( unlikely(d != pt_owner) )
+            {
+                rc = -EPERM;
+                break;
+            }
+
+            if ( unlikely(paging_mode_translate(pg_owner)) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            mfn = req.ptr >> PAGE_SHIFT;
+            gpfn = req.val;
+
+            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+
+            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for mach->phys update\n");
+                rc = -EINVAL;
+                break;
+            }
+
+            set_gpfn_from_mfn(mfn, gpfn);
+
+            paging_mark_dirty(pg_owner, _mfn(mfn));
+
+            put_page(mfn_to_page(mfn));
+            break;
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(ureqs, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmu_update, "hihi",
+            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmu_update, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    domain_mmap_cache_destroy(&mapcache);
+
+    perfc_add(num_page_updates, i);
+
+ out:
+    if ( pt_owner != d )
+        rcu_unlock_domain(pt_owner);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+
+static int create_grant_pte_mapping(
+    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+    struct domain *d = v->domain;
+
+    adjust_guest_l1e(nl1e, d);
+
+    gmfn = pte_addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+    if ( !paging_mode_refcounts(d) )
+        put_page_from_l1e(ol1e, d);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+
+    return rc;
+}
+
+static int destroy_grant_pte_mapping(
+    uint64_t addr, unsigned long frame, struct domain *d)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+
+    gmfn = addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        page_unlock(page);
+        gdprintk(XENLOG_WARNING,
+                 "PTE entry %"PRIpte" for address %"PRIx64" doesn't match 
frame %lx\n",
+                 l1e_get_intpte(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY
+                  (l1,
+                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
+                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+                   0)) )
+    {
+        page_unlock(page);
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+    return rc;
+}
+
+
+static int create_grant_va_mapping(
+    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    struct domain *d = v->domain;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int okay;
+
+    adjust_guest_l1e(nl1e, d);
+
+    pl1e = guest_map_l1e(va, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", 
va);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    guest_unmap_l1e(pl1e);
+
+    if ( okay && !paging_mode_refcounts(d) )
+        put_page_from_l1e(ol1e, d);
+
+    return okay ? GNTST_okay : GNTST_general_error;
+}
+
+static int replace_grant_va_mapping(
+    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc = 0;
+
+    pl1e = guest_map_l1e(addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", 
addr);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        rc = GNTST_general_error;
+        goto out;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        rc = GNTST_general_error;
+        put_page(l1pg);
+        goto out;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    ol1e = *pl1e;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "PTE entry %lx for address %lx doesn't match frame %lx\n",
+                 l1e_get_pfn(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+ unlock_and_out:
+    page_unlock(l1pg);
+    put_page(l1pg);
+ out:
+    guest_unmap_l1e(pl1e);
+    return rc;
+}
+
+static int destroy_grant_va_mapping(
+    unsigned long addr, unsigned long frame, struct vcpu *v)
+{
+    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+}
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                           unsigned int flags, unsigned int cache_flags)
+{
+    l1_pgentry_t pte;
+    uint32_t grant_pte_flags;
+
+    grant_pte_flags =
+        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
+    if ( cpu_has_nx )
+        grant_pte_flags |= _PAGE_NX_BIT;
+
+    pte = l1e_from_pfn(frame, grant_pte_flags);
+    if ( (flags & GNTMAP_application_map) )
+        l1e_add_flags(pte,_PAGE_USER);
+    if ( !(flags & GNTMAP_readonly) )
+        l1e_add_flags(pte,_PAGE_RW);
+
+    l1e_add_flags(pte,
+                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
+                   & _PAGE_AVAIL);
+
+    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
+
+    if ( flags & GNTMAP_contains_pte )
+        return create_grant_pte_mapping(addr, pte, current);
+    return create_grant_va_mapping(addr, pte, current);
+}
+
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            uint64_t new_addr, unsigned int flags)
+{
+    struct vcpu *curr = current;
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc;
+
+    if ( flags & GNTMAP_contains_pte )
+    {
+        if ( !new_addr )
+            return destroy_grant_pte_mapping(addr, frame, curr->domain);
+
+        return GNTST_general_error;
+    }
+
+    if ( !new_addr )
+        return destroy_grant_va_mapping(addr, frame, curr);
+
+    pl1e = guest_map_l1e(new_addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Could not find L1 PTE for address %"PRIx64"\n", new_addr);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+                                gl1mfn, curr, 0)) )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    guest_unmap_l1e(pl1e);
+
+    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
+    if ( rc && !paging_mode_refcounts(curr->domain) )
+        put_page_from_l1e(ol1e, curr->domain);
+
+    return rc;
+}
+
+static int __do_update_va_mapping(
+    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
+{
+    l1_pgentry_t   val = l1e_from_intpte(val64);
+    struct vcpu   *v   = current;
+    struct domain *d   = v->domain;
+    struct page_info *gl1pg;
+    l1_pgentry_t  *pl1e;
+    unsigned long  bmap_ptr, gl1mfn;
+    cpumask_t     *mask = NULL;
+    int            rc;
+
+    perfc_incr(calls_to_update_va);
+
+    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
+    if ( rc )
+        return rc;
+
+    rc = -EINVAL;
+    pl1e = guest_map_l1e(va, &gl1mfn);
+    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
+        goto out;
+
+    gl1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(gl1pg) )
+    {
+        put_page(gl1pg);
+        goto out;
+    }
+
+    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(gl1pg);
+        put_page(gl1pg);
+        goto out;
+    }
+
+    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
+
+    page_unlock(gl1pg);
+    put_page(gl1pg);
+
+ out:
+    if ( pl1e )
+        guest_unmap_l1e(pl1e);
+
+    switch ( flags & UVMF_FLUSHTYPE_MASK )
+    {
+    case UVMF_TLB_FLUSH:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            flush_tlb_local();
+            break;
+        case UVMF_ALL:
+            mask = d->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
+                                                                     void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_mask(mask);
+        break;
+
+    case UVMF_INVLPG:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            paging_invlpg(v, va);
+            break;
+        case UVMF_ALL:
+            mask = d->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
+                                                                     void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
+        break;
+    }
+
+    return rc;
+}
+
+long do_update_va_mapping(unsigned long va, u64 val64,
+                          unsigned long flags)
+{
+    return __do_update_va_mapping(va, val64, flags, current->domain);
+}
+
+long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
+                                      unsigned long flags,
+                                      domid_t domid)
+{
+    struct domain *pg_owner;
+    int rc;
+
+    if ( (pg_owner = get_pg_owner(domid)) == NULL )
+        return -ESRCH;
+
+    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
+
+    put_pg_owner(pg_owner);
+
+    return rc;
+}
+
+
+long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
+                unsigned int entries)
+{
+    int nr_pages = (entries + 511) / 512;
+    unsigned long frames[16];
+    struct vcpu *curr = current;
+    long ret;
+
+    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    if ( copy_from_guest(frames, frame_list, nr_pages) )
+        return -EFAULT;
+
+    domain_lock(curr->domain);
+
+    if ( (ret = set_gdt(curr, frames, entries)) == 0 )
+        flush_tlb_local();
+
+    domain_unlock(curr->domain);
+
+    return ret;
+}
+
+
+long do_update_descriptor(u64 pa, u64 desc)
+{
+    struct domain *dom = current->domain;
+    unsigned long gmfn = pa >> PAGE_SHIFT;
+    unsigned long mfn;
+    unsigned int  offset;
+    struct desc_struct *gdt_pent, d;
+    struct page_info *page;
+    long ret = -EINVAL;
+
+    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
+
+    *(u64 *)&d = desc;
+
+    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
+    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
+         !page ||
+         !check_descriptor(dom, &d) )
+    {
+        if ( page )
+            put_page(page);
+        return -EINVAL;
+    }
+    mfn = page_to_mfn(page);
+
+    /* Check if the given frame is in use in an unsafe context. */
+    switch ( page->u.inuse.type_info & PGT_type_mask )
+    {
+    case PGT_seg_desc_page:
+        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
+            goto out;
+        break;
+    default:
+        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
+            goto out;
+        break;
+    }
+
+    paging_mark_dirty(dom, _mfn(mfn));
+
+    /* All is good so make the update. */
+    gdt_pent = map_domain_page(_mfn(mfn));
+    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
+    unmap_domain_page(gdt_pent);
+
+    put_page_type(page);
+
+    ret = 0; /* success */
+
+ out:
+    put_page(page);
+
+    return ret;
+}
+
+
+/*************************
+ * Descriptor Tables
+ */
+
+void destroy_gdt(struct vcpu *v)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
+
+    v->arch.pv_vcpu.gdt_ents = 0;
+    pl1e = gdt_ldt_ptes(v->domain, v);
+    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
+    {
+        pfn = l1e_get_pfn(pl1e[i]);
+        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
+            put_page_and_type(mfn_to_page(pfn));
+        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
+        v->arch.pv_vcpu.gdt_frames[i] = 0;
+    }
+}
+
+
+long set_gdt(struct vcpu *v,
+             unsigned long *frames,
+             unsigned int entries)
+{
+    struct domain *d = v->domain;
+    l1_pgentry_t *pl1e;
+    /* NB. There are 512 8-byte entries per GDT page. */
+    unsigned int i, nr_pages = (entries + 511) / 512;
+
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    /* Check the pages in the new GDT. */
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        struct page_info *page;
+
+        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
+        if ( !page )
+            goto fail;
+        if ( !get_page_type(page, PGT_seg_desc_page) )
+        {
+            put_page(page);
+            goto fail;
+        }
+        frames[i] = page_to_mfn(page);
+    }
+
+    /* Tear down the old GDT. */
+    destroy_gdt(v);
+
+    /* Install the new GDT. */
+    v->arch.pv_vcpu.gdt_ents = entries;
+    pl1e = gdt_ldt_ptes(d, v);
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
+        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
+    }
+
+    return 0;
+
+ fail:
+    while ( i-- > 0 )
+    {
+        put_page_and_type(mfn_to_page(frames[i]));
+    }
+    return -EINVAL;
+}
+
+/*************************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    unsigned long cr2;
+    l1_pgentry_t  pte;
+};
+
+static int ptwr_emulated_read(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    unsigned int rc = bytes;
+    unsigned long addr = offset;
+
+    if ( !__addr_ok(addr) ||
+         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(
+    unsigned long addr,
+    paddr_t old,
+    paddr_t val,
+    unsigned int bytes,
+    unsigned int do_cmpxchg,
+    struct ptwr_emulate_ctxt *ptwr_ctxt)
+{
+    unsigned long mfn;
+    unsigned long unaligned_addr = addr;
+    struct page_info *page;
+    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    int ret;
+
+    /* Only allow naturally-aligned stores within the original %cr2 page. */
+    if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                 ptwr_ctxt->cr2, addr, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Turn a sub-word access into a full-word access. */
+    if ( bytes != sizeof(paddr_t) )
+    {
+        paddr_t      full;
+        unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
+
+        /* Align address; read full word. */
+        addr &= ~(sizeof(paddr_t)-1);
+        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+        {
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+        /* Mask out bits provided by caller. */
+        full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
+        /* Shift the caller value and OR in the missing bits. */
+        val  &= (((paddr_t)1 << (bytes*8)) - 1);
+        val <<= (offset)*8;
+        val  |= full;
+        /* Also fill in missing parts of the cmpxchg old value. */
+        old  &= (((paddr_t)1 << (bytes*8)) - 1);
+        old <<= (offset)*8;
+        old  |= full;
+    }
+
+    pte  = ptwr_ctxt->pte;
+    mfn  = l1e_get_pfn(pte);
+    page = mfn_to_page(mfn);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+    ASSERT(mfn_valid(_mfn(mfn)));
+    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+    ASSERT(page_get_owner(page) == d);
+
+    /* Check the new PTE. */
+    nl1e = l1e_from_intpte(val);
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
+    {
+    default:
+        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+        {
+            /*
+             * If this is an upper-half write to a PAE PTE then we assume that
+             * the guest has simply got the two writes the wrong way round. We
+             * zap the PRESENT bit on the assumption that the bottom half will
+             * be written immediately after we return to the guest.
+             */
+            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+                     PRIpte"\n", l1e_get_intpte(nl1e));
+            l1e_remove_flags(nl1e, _PAGE_PRESENT);
+        }
+        else
+        {
+            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
+            return X86EMUL_UNHANDLEABLE;
+        }
+        break;
+    case 0:
+        break;
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
+        break;
+    }
+
+    adjust_guest_l1e(nl1e, d);
+
+    /* Checked successfully: do the update (write or cmpxchg). */
+    pl1e = map_domain_page(_mfn(mfn));
+    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+    if ( do_cmpxchg )
+    {
+        int okay;
+        intpte_t t = old;
+        ol1e = l1e_from_intpte(old);
+
+        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
+        okay = (okay && t == old);
+
+        if ( !okay )
+        {
+            unmap_domain_page(pl1e);
+            put_page_from_l1e(nl1e, d);
+            return X86EMUL_RETRY;
+        }
+    }
+    else
+    {
+        ol1e = *pl1e;
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+            BUG();
+    }
+
+    trace_ptwr_emulation(addr, nl1e);
+
+    unmap_domain_page(pl1e);
+
+    /* Finally, drop the old PTE. */
+    put_page_from_l1e(ol1e, d);
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t val = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+    {
+        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&val, p_data, bytes);
+
+    return ptwr_emulated_update(
+        offset, 0, val, bytes, 0,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int ptwr_emulated_cmpxchg(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t old = 0, new = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
+    {
+        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&old, p_old, bytes);
+    memcpy(&new, p_new, bytes);
+
+    return ptwr_emulated_update(
+        offset, old, new, bytes, 1,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
+                                              : X86EMUL_UNHANDLEABLE;
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+    .read       = ptwr_emulated_read,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = ptwr_emulated_write,
+    .cmpxchg    = ptwr_emulated_cmpxchg,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
+                       struct cpu_user_regs *regs)
+{
+    struct domain *d = v->domain;
+    struct page_info *page;
+    l1_pgentry_t      pte;
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .ctxt = {
+            .regs = regs,
+            .vendor = d->arch.cpuid->x86_vendor,
+            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .swint_emulate = x86_swint_emulate_none,
+        },
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
+         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
+         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
+        goto bail;
+
+    page = l1e_get_page(pte);
+    if ( !page_lock(page) )
+    {
+        put_page(page);
+        goto bail;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        put_page(page);
+        goto bail;
+    }
+
+    ptwr_ctxt.cr2 = addr;
+    ptwr_ctxt.pte = pte;
+
+    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
+
+    page_unlock(page);
+    put_page(page);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which are marked
+         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
+         * update has succeeded on a different vcpu).  Anything else is an
+         * emulation bug, or a guest playing with the instruction stream under
+         * Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ptwr_ctxt.ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+ bail:
+    return 0;
+}
+
+/*************************
+ * fault handling for read-only MMIO pages
+ */
+
+int mmio_ro_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
+
+    /* Only allow naturally-aligned stores at the original %cr2 address. */
+    if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
+         offset != mmio_ro_ctxt->cr2 )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                mmio_ro_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmio_ro_emulated_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+int mmcfg_intercept_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
+
+    /*
+     * Only allow naturally-aligned stores no wider than 4 bytes to the
+     * original %cr2 address.
+     */
+    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
+         offset != mmio_ctxt->cr2 )
+    {
+        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
+                mmio_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    offset &= 0xfff;
+    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
+                                  offset, bytes, p_data) >= 0 )
+        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
+                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
+                        *(uint32_t *)p_data);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmcfg_intercept_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmcfg_intercept_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
+                          struct cpu_user_regs *regs)
+{
+    l1_pgentry_t pte;
+    unsigned long mfn;
+    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
+    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
+    struct x86_emulate_ctxt ctxt = {
+        .regs = regs,
+        .vendor = v->domain->arch.cpuid->x86_vendor,
+        .addr_size = addr_size,
+        .sp_size = addr_size,
+        .swint_emulate = x86_swint_emulate_none,
+        .data = &mmio_ro_ctxt
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of MMIO pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
+        return 0;
+
+    mfn = l1e_get_pfn(pte);
+    if ( mfn_valid(_mfn(mfn)) )
+    {
+        struct page_info *page = mfn_to_page(mfn);
+        struct domain *owner = page_get_owner_and_reference(page);
+
+        if ( owner )
+            put_page(page);
+        if ( owner != dom_io )
+            return 0;
+    }
+
+    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        return 0;
+
+    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
+        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
+    else
+        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page or a successful
+         * concurrent pagetable update).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/grant_table.h 
b/xen/include/asm-x86/grant_table.h
index e1b3391efc..9580dc32dc 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -17,6 +17,10 @@ int create_grant_host_mapping(uint64_t addr, unsigned long 
frame,
                              unsigned int flags, unsigned int cache_flags);
 int replace_grant_host_mapping(
     uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags);
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                           unsigned int flags, unsigned int cache_flags);
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            uint64_t new_addr, unsigned int flags);
 
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 8e55593154..8e2bf91070 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -319,6 +319,8 @@ static inline void *__page_to_virt(const struct page_info 
*pg)
                     (PAGE_SIZE / (sizeof(*pg) & -sizeof(*pg))));
 }
 
+int alloc_page_type(struct page_info *page, unsigned long type,
+                    int preemptible);
 int free_page_type(struct page_info *page, unsigned long type,
                    int preemptible);
 
@@ -364,6 +366,13 @@ int  put_old_guest_table(struct vcpu *);
 int  get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
+int get_page_and_type_from_pagenr(unsigned long page_nr,
+                                  unsigned long type,
+                                  struct domain *d,
+                                  int partial,
+                                  int preemptible);
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d);
+void get_page_light(struct page_info *page);
 
 static inline void put_page_and_type(struct page_info *page)
 {
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index c92cba41a0..8929a7e01c 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -161,6 +161,7 @@ int map_pages_to_xen(
 /* Alter the permissions of a range of Xen virtual address space. */
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags);
 int destroy_xen_mappings(unsigned long v, unsigned long e);
+int update_xen_mappings(unsigned long mfn, unsigned int cacheattr);
 /*
  * Create only non-leaf page table entries for the
  * page range in Xen virtual address space.
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.