|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 6 of 8 [RFC]] libxc: introduce xc_domain_move_memory
Hi,
This looks like a promising start. Two thoughts:
1. You currently move memory into a bufferm free it, allocate new memory
and restore the contents. Copying directly from old to new would be
significantly faster, and you could do it for _most_ batches:
- copy old batch 0 to the backup buffer; free old batch 0;
- allocate new batch 1; copy batch 1 directly; free old batch 1;
...
- allocate new batch n; copy batch n directly; free old batch n;
- allocate new batch 0; copy batch 0 from the backup buffer.
2. Clearing all the _PAGE_PRESENT bits with mmu-update
hypercalls must be overkill. It ought to be possible to drop
those pages' typecounts to 0 by unpinning them and then resetting all
the vcpus. The you should be able to just update the contents
with normal writes and re-pin afterwards.
Cheers,
Tim.
At 04:49 +0200 on 09 Apr (1365482951), Dario Faggioli wrote:
> as a mechanism of deallocating and reallocating (immediately!) _all_
> the memory of a domain. Notice it relies on the guest being suspended
> already, before the function is invoked.
>
> Of course, it is quite likely that the memory ends up in different
> places from where it was before calling it but, for instance, the fact
> that this is actually a different NUMA node (or anything else) does not
> depend by any means from this function.
>
> In fact, here the guest pages are just freed and immediately
> re-allocated (you can see it as a very quick, back-to-back save-restore
> cycle).
>
> If the current domain configuration says, for instance, that new
> allocations should go to a specific NUMA node, then the whole domain
> is, as a matter of facts, moved there, but again, this is not
> something this function does explicitly.
>
> The way we do this is, very briefly, as follows:
> 1. drop all the references to all the pages of a domain,
> 2. backup the content of a batch of pages,
> 3. deallocate the a batch,
> 4. allocate a new set of pages for the batch,
> 5. copy the backed up content in the new pages,
> 6. if there are more pages, go back to 2, othwrwise
> 7. update the page tables, the vcpu contexts, the P2M, etc.
>
> The above raises a number of quite complex issues and, _not_all_
> of them are being dealt with or solved in this series (RFC means
> something after all, doesn't it? ;-P).
>
> XXX Open issues are:
> - HVM ("easy" to add, but it's not in this patch. See the
> cover letter for the series);
> - PAE guests, as they need special attention for some of
> the page tables (should be trivial to add);
> - grant tables/granted pages: how to move them?
> - TMEM: how to "move" it?
> - shared/paged pages: what to do with them?
> - guest pages mapped in Xen, for instance:
> * vcpu info pages: moved but, how to update the mapping?
> * EOI page: moved but, how to update the mapping?
>
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>
> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> --- a/tools/libxc/Makefile
> +++ b/tools/libxc/Makefile
> @@ -48,6 +48,11 @@ else
> GUEST_SRCS-y += xc_nomigrate.c
> endif
>
> +# XXX: Well, for sure there are some X86-ism in the current code.
> +# Making it more ARM friendly should not be a big deal though,
> +# will do for next release.
> +GUEST_SRCS-$(CONFIG_X86) += xc_domain_movemem.c
> +
> vpath %.c ../../xen/common/libelf
> CFLAGS += -I../../xen/common/libelf
>
> diff --git a/tools/libxc/xc_domain_movemem.c b/tools/libxc/xc_domain_movemem.c
> new file mode 100644
> --- /dev/null
> +++ b/tools/libxc/xc_domain_movemem.c
> @@ -0,0 +1,766 @@
> +/******************************************************************************
> + * xc_domain_movemem.c
> + *
> + * Deallocate and reallocate all the memory of a domain.
> + *
> + * Copyright (c) 2013, Dario Faggioli.
> + * Copyright (c) 2012, Citrix Systems, Inc.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation;
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA
> + */
> +
> +#include <inttypes.h>
> +#include <time.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <sys/time.h>
> +#include <xc_core.h>
> +
> +#include "xc_private.h"
> +#include "xc_dom.h"
> +#include "xg_private.h"
> +#include "xg_save_restore.h"
> +
> +/* Needed from translation macros in xg_private.h */
> +static struct domain_info_context _dinfo;
> +static struct domain_info_context *dinfo = &_dinfo;
> +
> +#define MAX_BATCH_SIZE 1024
> +#define MAX_PIN_BATCH 1024
> +
> +#define MFN_IS_IN_PSEUDOPHYS_MAP(_mfn, _max_mfn, _minfo, _m2p) \
> + (((_mfn) < (_max_mfn)) && ((mfn_to_pfn(_mfn, _m2p) < (_minfo).p2m_size)
> && \
> + (pfn_to_mfn(mfn_to_pfn(_mfn, _m2p), (_minfo).p2m_table, \
> + (_minfo).guest_width) == (_mfn))))
> +
> +/*
> + * This is to determine which entries in this page table hold reserved
> + * hypervisor mappings. This depends on the current page table type as
> + * well as the number of paging levels (see also xc_domain_save.c).
> + *
> + * XXX: export this function so that it can be used both here and from
> + * canonicalize_pagetable(), in xc_domain_save.c.
> + */
> +static int is_xen_mapping(struct xc_domain_meminfo *minfo, unsigned long
> type,
> + unsigned long hvirt_start, unsigned long m2p_mfn0,
> + const void *spage, int pte)
> +{
> + int xen_start, xen_end, pte_last;
> +
> + xen_start = xen_end = pte_last = PAGE_SIZE / 8;
> +
> + if ( (minfo->pt_levels == 3) && (type == XEN_DOMCTL_PFINFO_L3TAB) )
> + xen_start = L3_PAGETABLE_ENTRIES_PAE;
> +
> + /*
> + * In PAE only the L2 mapping the top 1GB contains Xen mappings.
> + * We can spot this by looking for the guest's mappingof the m2p.
> + * Guests must ensure that this check will fail for other L2s.
> + */
> + if ( (minfo->pt_levels == 3) && (type == XEN_DOMCTL_PFINFO_L2TAB) )
> + {
> + int hstart;
> + uint64_t he;
> +
> + hstart = (hvirt_start >> L2_PAGETABLE_SHIFT_PAE) & 0x1ff;
> + he = ((const uint64_t *) spage)[hstart];
> +
> + if ( ((he >> PAGE_SHIFT) & MFN_MASK_X86) == m2p_mfn0 )
> + {
> + /* hvirt starts with xen stuff... */
> + xen_start = hstart;
> + }
> + else if ( hvirt_start != 0xf5800000 )
> + {
> + /* old L2s from before hole was shrunk... */
> + hstart = (0xf5800000 >> L2_PAGETABLE_SHIFT_PAE) & 0x1ff;
> + he = ((const uint64_t *) spage)[hstart];
> + if ( ((he >> PAGE_SHIFT) & MFN_MASK_X86) == m2p_mfn0 )
> + xen_start = hstart;
> + }
> + }
> +
> + if ( (minfo->pt_levels == 4) && (type == XEN_DOMCTL_PFINFO_L4TAB) )
> + {
> + /*
> + * XXX SMH: should compute these from hvirt_start (which we have)
> + * and hvirt_end (which we don't)
> + */
> + xen_start = 256;
> + xen_end = 272;
> + }
> +
> + return pte >= xen_start && pte < xen_end;
> +}
> +
> +/*
> + * This function will basically deallocate _all_ the memory of a domain and
> + * reallocate it immediately. It relies on the guest being suspended
> + * already, before the function is even invoked.
> + *
> + * Of course, it is quite likely that the memory ends up in different places
> + * from where it was before calling this but, for instance, the fact that
> + * this is actually a different NUMA node (or anything else) does not
> + * depend by any means from this function. In fact, here the guest pages are
> + * just freed and immediately re-allocated (you can see it as a very quick,
> + * back-to-back domain_save--domain_restore). If the current domain
> + * configuration says, for instance, that new allocation should go to a
> + * different NUMA nodes, then the whole domain is moved to there, but again,
> + * this is not something this function does explicitly.
> + *
> + * If actually interested in doing something like that (i.e., moving the
> + * domain to a different NUMA node), calling xc_domain_node_setaffinity()
> + * right before this should achieve it.
> + */
> +int xc_domain_move_memory(xc_interface *xch, uint32_t domid/*, int hvm*/)
> +{
> + unsigned int i, j;
> + int rc = 1;
> +
> + xc_dominfo_t info;
> + struct xc_domain_meminfo minfo;
> +
> + struct mmuext_op pin[MAX_PIN_BATCH];
> + unsigned int nr_pins;
> +
> + struct xc_mmu *mmu = NULL;
> + unsigned int xen_pt_levels, dom_guest_width;
> + unsigned long max_mfn, hvirt_start, m2p_mfn0;
> + vcpu_guest_context_any_t ctxt;
> +
> + void *live_p2m_frame_list_list = NULL;
> + void *live_p2m_frame_list = NULL;
> +
> + /*
> + * XXX: grant tables & granted pages need to be considered, e.g.,
> + * using xc_is_page_granted_vX() in xc_offline_page.c to
> + * recognise them, etc.
> + int gnt_num;
> + grant_entry_v1_t *gnttab_v1 = NULL;
> + grant_entry_v2_t *gnttab_v2 = NULL;
> + */
> +
> + void *old_p, *new_p, *backup = NULL;
> + unsigned long mfn, pfn;
> + uint64_t fll;
> +
> + xen_pfn_t *new_mfns= NULL, *old_mfns = NULL, *batch_pfns = NULL;
> + int pte_num = PAGE_SIZE / 8, cleared_pte = 0;
> + xen_pfn_t *m2p_table, *orig_m2p = NULL;
> + shared_info_any_t *live_shinfo = NULL;
> +
> + unsigned long n = 0, n_skip = 0;
> +
> + int debug = 0; /* XXX will become a parameter */
> +
> + if ( !get_platform_info(xch, domid, &max_mfn, &hvirt_start,
> + &xen_pt_levels, &dom_guest_width) )
> + {
> + ERROR("Failed getting platform info");
> + return 1;
> + }
> +
> + /* We expect domain to be suspende already */
> + if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
> + {
> + PERROR("Failed getting domain info");
> + return 1;
> + }
> + if ( !info.shutdown || info.shutdown_reason != SHUTDOWN_suspend)
> + {
> + PERROR("Domain appears not to be suspended");
> + return 1;
> + }
> +
> + DBGPRINTF("Establishing the mappings for M2P and P2M");
> + memset(&minfo, 0, sizeof(minfo));
> + if ( !(m2p_table = xc_map_m2p(xch, max_mfn, PROT_READ, &m2p_mfn0)) )
> + {
> + PERROR("Failed to map the M2P table");
> + return 1;
> + }
> + if ( xc_map_domain_meminfo(xch, domid, &minfo) )
> + {
> + PERROR("Failed to map domain's memory information");
> + goto out;
> + }
> + dinfo->guest_width = minfo.guest_width;
> + dinfo->p2m_size = minfo.p2m_size;
> +
> + /*
> + * XXX
> + DBGPRINTF("Mapping the grant tables");
> + gnttab_v2 = xc_gnttab_map_table_v2(xch, domid, &gnt_num);
> + if (!gnttab_v2)
> + {
> + PERROR("Failed to map V1 grant table... Trying V1");
> + gnttab_v1 = xc_gnttab_map_table_v1(xch, domid, &gnt_num);
> + if (!gnttab_v1)
> + {
> + PERROR("Failed to map grant table");
> + goto out;
> + }
> + }
> + DBGPRINTF("Grant table mapped. %d grants found", gnt_num);
> + */
> +
> + mmu = xc_alloc_mmu_updates(xch, (domid+1)<<16|domid);
> + if ( mmu == NULL )
> + {
> + PERROR("Failed to allocate memory for MMU updates");
> + goto out;
> + }
> +
> + /* Alloc support data structures */
> + new_mfns = calloc(MAX_BATCH_SIZE, sizeof(xen_pfn_t));
> + old_mfns = calloc(MAX_BATCH_SIZE, sizeof(xen_pfn_t));
> + batch_pfns = calloc(MAX_BATCH_SIZE, sizeof(xen_pfn_t));
> +
> + backup = malloc(PAGE_SIZE * MAX_BATCH_SIZE);
> +
> + orig_m2p = calloc(max_mfn, sizeof(xen_pfn_t));
> +
> + if ( !new_mfns || !old_mfns || !batch_pfns || !backup || !orig_m2p )
> + {
> + ERROR("Failed to allocate copying and/or backup data structures");
> + goto out;
> + }
> +
> + DBGPRINTF("Saving the original M2P");
> + memcpy(orig_m2p, m2p_table, max_mfn * sizeof(xen_pfn_t));
> +
> + DBGPRINTF("Starting deallocating and reallocating all memory for domain
> %d"
> + "\n\tnr_pages=%lu, nr_shared_pages=%lu, nr_paged_pages=%lu"
> + "\n\tnr_online_vcpus=%u, max_vcpu_id=%u",
> + domid, info.nr_pages, info.nr_shared_pages,
> info.nr_paged_pages,
> + info.nr_online_vcpus, info.max_vcpu_id);
> +
> + /* Beware: no going back from this point!! */
> +
> + /*
> + * As a part of the process of dropping all the references to the
> existing
> + * pages in memory, so that we can free (and then re-allocate them) we
> need
> + * to unpin them.
> + *
> + * We do that in batches of 1024 PFNs at each step, to amortize the cost
> + * of xc_mmuext_op() calls.
> + */
> + nr_pins = 0;
> + for ( i = 0; i < minfo.p2m_size; i++ )
> + {
> + if ( (minfo.pfn_type[i] & XEN_DOMCTL_PFINFO_LPINTAB) == 0 )
> + continue;
> +
> + pin[nr_pins].cmd = MMUEXT_UNPIN_TABLE;
> + pin[nr_pins].arg1.mfn = minfo.p2m_table[i];
> + nr_pins++;
> +
> + if ( nr_pins == MAX_PIN_BATCH )
> + {
> + if ( xc_mmuext_op(xch, pin, nr_pins, domid) < 0 )
> + {
> + PERROR("Failed to unpin a batch of %d MFNs", nr_pins);
> + goto out;
> + }
> + else
> + DBGPRINTF("Unpinned a batch of %d MFNs", nr_pins);
> + nr_pins = 0;
> + }
> + }
> + if ( (nr_pins != 0) && (xc_mmuext_op(xch, pin, nr_pins, domid) < 0) )
> + {
> + PERROR("Failed to unpin a batch of %d MFNs", nr_pins);
> + goto out;
> + }
> + else
> + DBGPRINTF("Unpinned a batch of %d MFNs", nr_pins);
> +
> + /*
> + * After unpinning, we also need to remove the _PAGE_PRESENT bit from
> + * the domain's PTEs, for the pages that we want to deallocate, or they
> + * just could not go away.
> + */
> + for (i = 0; i < minfo.p2m_size; i++)
> + {
> + void *content;
> + xen_pfn_t table_type, table_mfn = pfn_to_mfn(i, minfo.p2m_table,
> + minfo.guest_width);
> +
> + if ( table_mfn == INVALID_P2M_ENTRY ||
> + minfo.pfn_type[i] == XEN_DOMCTL_PFINFO_XTAB )
> + {
> + DBGPRINTF("Broken P2M entry at PFN 0x%x", i);
> + continue;
> + }
> +
> + table_type = minfo.pfn_type[i] & XEN_DOMCTL_PFINFO_LTABTYPE_MASK;
> + if ( table_type < XEN_DOMCTL_PFINFO_L1TAB ||
> + table_type > XEN_DOMCTL_PFINFO_L4TAB )
> + continue;
> +
> + content = xc_map_foreign_range(xch, domid, PAGE_SIZE,
> + PROT_READ, table_mfn);
> + if ( !content )
> + {
> + PERROR("Failed to map the table at MFN 0x%lx", table_mfn);
> + goto out;
> + }
> +
> + /* Go through each PTE of each table and clear the _PAGE_PRESENT bit
> */
> + for ( j = 0; j < pte_num; j++ )
> + {
> + uint64_t pte = ((uint64_t *)content)[j];
> +
> + if ( !pte || is_xen_mapping(&minfo, table_type, hvirt_start,
> m2p_mfn0, content, j) )
> + continue;
> +
> + if ( debug )
> + DBGPRINTF("Entry %d: PTE=0x%lx, MFN=0x%lx, PFN=0x%lx", j,
> pte,
> + (uint64_t)((pte & MADDR_MASK_X86)>>PAGE_SHIFT),
> + m2p_table[(unsigned long)((pte & MADDR_MASK_X86)
> + >>PAGE_SHIFT)]);
> +
> + pfn = m2p_table[(pte & MADDR_MASK_X86)>>PAGE_SHIFT];
> + pte &= ~_PAGE_PRESENT;
> +
> + if ( xc_add_mmu_update(xch, mmu, table_mfn << PAGE_SHIFT |
> + (j * (sizeof(uint64_t))) |
> + MMU_PT_UPDATE_PRESERVE_AD, pte) )
> + PERROR("Failed to add some PTE update operation");
> + else
> + cleared_pte++;
> + }
> +
> + if (content)
> + munmap(content, PAGE_SIZE);
> + }
> + if ( cleared_pte && xc_flush_mmu_updates(xch, mmu) )
> + {
> + PERROR("Failed flushing some PTE update operations");
> + goto out;
> + }
> + else
> + DBGPRINTF("Cleared presence for %d PTEs", cleared_pte);
> +
> + /* Scan all the P2M ... */
> + while ( n < minfo.p2m_size )
> + {
> + /* ... But all operations are done in batches */
> + for ( i = 0; (i < MAX_BATCH_SIZE) && (n < minfo.p2m_size); n++ )
> + {
> + xen_pfn_t mfn = pfn_to_mfn(n, minfo.p2m_table,
> minfo.guest_width);
> + xen_pfn_t mfn_type = minfo.pfn_type[n] &
> XEN_DOMCTL_PFINFO_LTAB_MASK;
> +
> + if (mfn == INVALID_P2M_ENTRY || !is_mapped(mfn) )
> + {
> + if ( debug )
> + DBGPRINTF("Skipping invalid or unmapped MFN 0x%lx", mfn);
> + n_skip++;
> + continue;
> + }
> + if ( mfn_type == XEN_DOMCTL_PFINFO_BROKEN ||
> + mfn_type == XEN_DOMCTL_PFINFO_XTAB ||
> + mfn_type == XEN_DOMCTL_PFINFO_XALLOC )
> + {
> + if ( debug )
> + DBGPRINTF("Skippong broken or alloc only MFN 0x%lx",
> mfn);
> + n_skip++;
> + continue;
> + }
> +
> + /*
> + if ( gnttab_v1 ?
> + xc_is_page_granted_v1(xch, mfn, gnttab_v1, gnt_num) :
> + xc_is_page_granted_v2(xch, mfn, gnttab_v2, gnt_num) )
> + {
> + n_skip++;
> + continue;
> + }
> + */
> +
> + old_mfns[i] = mfn;
> + batch_pfns[i] = n;
> + i++;
> + }
> +
> + /* Was the batch empty? */
> + if ( i == 0)
> + continue;
> +
> + /*
> + * And now the core of the whole thing: map the PFNs in the batch,
> + * backup them, allocate new pages for them, and copy them there.
> + * We do this in this order, and we pass through a local backup,
> + * because we don't want to risk hitting the max_mem limit for
> + * the domain (which would be possible, depending on MAX_BATCH_SIZE,
> + * if we try to do it like allocate->copy->deallocate).
> + *
> + * With MAX_BATCH_SIZE of 1024 and 4K pages, this means we are moving
> + * 4MB of guest memory for each batch.
> + */
> +
> + /* Map and backup */
> + old_p = xc_map_foreign_pages(xch, domid, PROT_READ, old_mfns, i);
> + if ( !old_p )
> + {
> + PERROR("Failed mapping the current MFNs\n");
> + goto out;
> + }
> + memcpy(backup, old_p, PAGE_SIZE * i);
> + munmap(old_p, PAGE_SIZE * i);
> +
> + /* Deallocation and re-allocation */
> + if ( xc_domain_decrease_reservation(xch, domid, i, 0, old_mfns) != i
> ||
> + xc_domain_populate_physmap_exact(xch, domid, i, 0, 0, new_mfns)
> )
> + {
> + PERROR("Failed making space or allocating the new MFNs\n");
> + munmap(backup, PAGE_SIZE * i);
> + goto out;
> + }
> +
> + /* Map of new pages, copy content and unmap */
> + new_p = xc_map_foreign_pages(xch, domid, PROT_WRITE, new_mfns, i);
> + if ( !new_p )
> + {
> + PERROR("Failed mapping the new MFNs\n");
> + munmap(backup, PAGE_SIZE * i);
> + goto out;
> + }
> + memcpy(new_p, backup, PAGE_SIZE * i);
> + munmap(new_p, PAGE_SIZE * i);
> + munmap(backup, PAGE_SIZE * i);
> +
> + /*
> + * Since we already have the new MFNs, we can update both the M2P
> + * and the P2M right here, within this same loop.
> + */
> + for ( j = 0; j < i; j++ )
> + {
> + minfo.p2m_table[batch_pfns[j]] = new_mfns[j];
> + if ( xc_add_mmu_update(xch, mmu,
> + (((uint64_t)new_mfns[j]) << PAGE_SHIFT) |
> + MMU_MACHPHYS_UPDATE, batch_pfns[j]) )
> + {
> + PERROR("Failed updating M2P\n");
> + goto out;
> + }
> + }
> + if ( xc_flush_mmu_updates(xch, mmu) )
> + {
> + PERROR("Failed updating M2P\n");
> + goto out;
> + }
> +
> + DBGPRINTF("Batch %lu/%ld done (%lu pages skipped)",
> + n / MAX_BATCH_SIZE, minfo.p2m_size / MAX_BATCH_SIZE,
> n_skip);
> + }
> +
> + /*
> + * Finally (oh, well...) update the PTEs of the domain again, putting
> + * the new MFNs there, and making the entries _PAGE_PRESENT again.
> + *
> + * This is a kind-of uncanonicalization, like it happens in save-resrote,
> + * although a very special one, and we rely on the snapshot of the M2P
> + * we made before starting all the deallocation/reallocation process.
> + */
> + for ( i = 0; i < minfo.p2m_size; i++ )
> + {
> + void *content;
> + xen_pfn_t table_type, table_mfn = pfn_to_mfn(i, minfo.p2m_table,
> + minfo.guest_width);
> +
> + table_type = minfo.pfn_type[i] & XEN_DOMCTL_PFINFO_LTABTYPE_MASK;
> + if ( table_type < XEN_DOMCTL_PFINFO_L1TAB ||
> + table_type > XEN_DOMCTL_PFINFO_L4TAB )
> + continue;
> +
> + /* We of course only care about tables */
> + content = xc_map_foreign_range(xch, domid, PAGE_SIZE,
> + PROT_WRITE, table_mfn);
> + if ( !content )
> + {
> + PERROR("Failed to map the table at MFN 0x%lx", table_mfn);
> + continue;
> + }
> +
> + for ( j = 0; j < PAGE_SIZE / 8; j++ )
> + {
> + uint64_t pte = ((uint64_t *)content)[j];
> +
> + if ( !pte || is_xen_mapping(&minfo, table_type, hvirt_start,
> m2p_mfn0, content, j) )
> + continue;
> +
> + /*
> + * Basically, we lookup the PFN from the snapshoted M2P and we
> + * pick up the new MFN from the P2M (since we updated it "live"
> + * during the re-allocation phase above).
> + */
> + mfn = (pte >> PAGE_SHIFT) & MFN_MASK_X86;
> + pfn = orig_m2p[mfn];
> +
> + if ( debug )
> + DBGPRINTF("Table[PTE]: 0x%lx[%d] ==> orig_m2p[0x%lx]=0x%lx, "
> + "p2m[0x%lx]=0x%lx // pte: 0x%lx --> 0x%lx",
> + table_mfn, j, mfn, pfn, pfn, minfo.p2m_table[pfn],
> + pte, (uint64_t)((pte & ~MADDR_MASK_X86)|
> +
> (minfo.p2m_table[pfn]<<PAGE_SHIFT)|
> + _PAGE_PRESENT));
> +
> + mfn = minfo.p2m_table[pfn];
> + pte &= ~MADDR_MASK_X86;
> + pte |= (uint64_t)mfn << PAGE_SHIFT;
> + pte |= _PAGE_PRESENT;
> +
> + ((uint64_t *)content)[j] = pte;
> +
> + if ( !MFN_IS_IN_PSEUDOPHYS_MAP(mfn, max_mfn, minfo, m2p_table) )
> + {
> + ERROR("Failed updating entry %d in table at MFN 0x%lx", j,
> table_mfn);
> + continue; // XXX
> + }
> + }
> +
> + if ( content )
> + munmap(content, PAGE_SIZE);
> + }
> +
> + DBGPRINTF("Re-pinning page table MFNs");
> +
> + /* Pin the able types again */
> + nr_pins = 0;
> + for ( i = 0; i < minfo.p2m_size; i++ )
> + {
> + if ( (minfo.pfn_type[i] & XEN_DOMCTL_PFINFO_LPINTAB) == 0 )
> + continue;
> +
> + switch ( minfo.pfn_type[i] & XEN_DOMCTL_PFINFO_LTABTYPE_MASK )
> + {
> + case XEN_DOMCTL_PFINFO_L1TAB:
> + pin[nr_pins].cmd = MMUEXT_PIN_L1_TABLE;
> + break;
> +
> + case XEN_DOMCTL_PFINFO_L2TAB:
> + pin[nr_pins].cmd = MMUEXT_PIN_L2_TABLE;
> + break;
> +
> + case XEN_DOMCTL_PFINFO_L3TAB:
> + pin[nr_pins].cmd = MMUEXT_PIN_L3_TABLE;
> + break;
> +
> + case XEN_DOMCTL_PFINFO_L4TAB:
> + pin[nr_pins].cmd = MMUEXT_PIN_L4_TABLE;
> + break;
> + default:
> + continue;
> + }
> + pin[nr_pins].arg1.mfn = minfo.p2m_table[i];
> + nr_pins++;
> +
> + if ( nr_pins == MAX_PIN_BATCH )
> + {
> + if ( xc_mmuext_op(xch, pin, nr_pins, domid) < 0 )
> + {
> + PERROR("Failed to pin a batch of %d MFNs", nr_pins);
> + goto out;
> + }
> + else
> + DBGPRINTF("Re-pinned a batch of %d MFNs", nr_pins);
> + nr_pins = 0;
> + }
> + }
> + if ( (nr_pins != 0) && (xc_mmuext_op(xch, pin, nr_pins, domid) < 0) )
> + {
> + PERROR("Failed to pin batch of %d page tables", nr_pins);
> + goto out;
> + }
> + else
> + DBGPRINTF("Re-pinned a batch of %d MFNs", nr_pins);
> +
> + /*
> + * Now, take care of the vCPUs contextes. It all happens as above,
> + * we use the original M2P and the new domain's P2M to update all
> + * the various references.
> + */
> + for ( i = 0; i <= info.max_vcpu_id; i++ )
> + {
> + xc_vcpuinfo_t vinfo;
> +
> + DBGPRINTF("Adjusting context for VCPU%d", i);
> +
> + if ( xc_vcpu_getinfo(xch, domid, i, &vinfo) )
> + {
> + PERROR("Failed getting info for VCPU%d", i);
> + goto out;
> + }
> + if ( !vinfo.online )
> + {
> + DBGPRINTF("VCPU%d seems offline", i);
> + continue;
> + }
> +
> + if ( xc_vcpu_getcontext(xch, domid, i, &ctxt) )
> + {
> + PERROR("No context for VCPU%d", i);
> + goto out;
> + }
> +
> + if ( i == 0 )
> + {
> + //start_info_any_t *start_info;
> +
> + /*
> + * Update the start info frame number. It is the 3rd argument
> + * to the HYPERVISOR_sched_op hypercall when op is
> + * SCHEDOP_shutdown and reason is SHUTDOWN_suspend, so we find
> + * it in EDX.
> + */
> + mfn = GET_FIELD(&ctxt, user_regs.edx);
> + mfn = minfo.p2m_table[mfn_to_pfn(mfn, orig_m2p)];
> + SET_FIELD(&ctxt, user_regs.edx, mfn);
> +
> + /*
> + * XXX: I checke, and store_mfn and console_mfn seemed ok, at
> + * least from a 'mapping' point of view, but more testing is
> + * needed.
> + start_info = xc_map_foreign_range(xch, domid, PAGE_SIZE,
> PROT_READ | PROT_WRITE, mfn);
> + munmap(start_info, PAGE_SIZE);
> + */
> + }
> +
> + /* GDT pointing MFNs */
> + for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt_ents); j++ )
> + {
> + mfn = GET_FIELD(&ctxt, gdt_frames[j]);
> + mfn = minfo.p2m_table[mfn_to_pfn(mfn, orig_m2p)];
> + SET_FIELD(&ctxt, gdt_frames[j], mfn);
> + }
> +
> + /* CR3 XXX: PAE needs special attenion here, I think */
> + mfn = UNFOLD_CR3(GET_FIELD(&ctxt, ctrlreg[3]));
> + mfn = minfo.p2m_table[mfn_to_pfn(mfn, orig_m2p)];
> + SET_FIELD(&ctxt, ctrlreg[3], FOLD_CR3(mfn));
> +
> + /* Guest pagetable (x86/64) in CR1 */
> + if ( (minfo.pt_levels == 4) && ctxt.x64.ctrlreg[1] )
> + {
> + /*
> + * XXX: save-restore code mangle with the least-significant
> + * bit ('valid PFN'). This should not be needed in here.
> + */
> + mfn = UNFOLD_CR3(ctxt.x64.ctrlreg[1]);
> + mfn = minfo.p2m_table[mfn_to_pfn(mfn, orig_m2p)];
> + ctxt.x64.ctrlreg[1] = FOLD_CR3(mfn);
> + }
> +
> + /*
> + * XXX: Xen refuses to set a new context for an existing vCPU if
> + * things like CR3, the GDTs have changed, even if the domain
> + * is suspended. Going through re-initializing the vCPU (by
> + * this one call below with a NULL ctxt) makes it possible,
> + * but is that sensible? And even if yes, is that the following
> + * _setcontext call issued below enough?
> + */
> + if ( xc_vcpu_setcontext(xch, domid, i, NULL) )
> + {
> + PERROR("Failed re-initialising VCPU%d", i);
> + goto out;
> + }
> + if ( xc_vcpu_setcontext(xch, domid, i, &ctxt) )
> + {
> + PERROR("Failed when updating context for VCPU%d", i);
> + goto out;
> + }
> + }
> +
> + /*
> + * Finally (an this time for real), we take care of the pages mapping
> + * the P2M, and of the P2M entries themselves.
> + */
> +
> + live_shinfo = xc_map_foreign_range(xch, domid,
> + PAGE_SIZE, PROT_READ|PROT_WRITE,
> info.shared_info_frame);
> + if ( !live_shinfo )
> + {
> + PERROR("Failed mapping live_shinfo");
> + goto out;
> + }
> +
> + fll = GET_FIELD(live_shinfo, arch.pfn_to_mfn_frame_list_list);
> + fll = minfo.p2m_table[mfn_to_pfn(fll, orig_m2p)];
> + live_p2m_frame_list_list = xc_map_foreign_range(xch, domid, PAGE_SIZE,
> + PROT_READ|PROT_WRITE,
> fll);
> + if ( !live_p2m_frame_list_list )
> + {
> + PERROR("Couldn't map live_p2m_frame_list_list");
> + goto out;
> + }
> + SET_FIELD(live_shinfo, arch.pfn_to_mfn_frame_list_list, fll);
> +
> + /* First, update the frames caontaining the list of the P2M frames */
> + for ( i = 0; i < P2M_FLL_ENTRIES; i++ )
> + {
> +
> + mfn = ((uint64_t *)live_p2m_frame_list_list)[i];
> + mfn = minfo.p2m_table[mfn_to_pfn(mfn, orig_m2p)];
> + ((uint64_t *)live_p2m_frame_list_list)[i] = mfn;
> + }
> +
> + live_p2m_frame_list =
> + xc_map_foreign_pages(xch, domid, PROT_READ|PROT_WRITE,
> + live_p2m_frame_list_list,
> + P2M_FLL_ENTRIES);
> + if ( !live_p2m_frame_list )
> + {
> + PERROR("Couldn't map live_p2m_frame_list");
> + goto out;
> + }
> +
> + /* And then update the actual entries of it */
> + for ( i = 0; i < P2M_FL_ENTRIES; i++ )
> + {
> + mfn = ((uint64_t *)live_p2m_frame_list)[i];
> + mfn = minfo.p2m_table[mfn_to_pfn(mfn, orig_m2p)];
> + ((uint64_t *)live_p2m_frame_list)[i] = mfn;
> + }
> +
> + rc = 0;
> +
> + out:
> + if ( live_p2m_frame_list_list )
> + munmap(live_p2m_frame_list_list, PAGE_SIZE);
> + if ( live_p2m_frame_list )
> + munmap(live_p2m_frame_list, P2M_FLL_ENTRIES * PAGE_SIZE);
> + if ( live_shinfo )
> + munmap(live_shinfo, PAGE_SIZE);
> +
> + free(mmu);
> + free(new_mfns);
> + free(old_mfns);
> + free(batch_pfns );
> + free(backup);
> + free(orig_m2p);
> +
> + /*
> + if (gnttab_v1)
> + munmap(gnttab_v1, gnt_num / (PAGE_SIZE/sizeof(grant_entry_v1_t)));
> + if (gnttab_v2)
> + munmap(gnttab_v2, gnt_num / (PAGE_SIZE/sizeof(grant_entry_v2_t)));
> + */
> +
> + xc_unmap_domain_meminfo(xch, &minfo);
> + munmap(m2p_table, M2P_SIZE(max_mfn));
> +
> + return !!rc;
> +}
> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> --- a/tools/libxc/xenguest.h
> +++ b/tools/libxc/xenguest.h
> @@ -272,6 +272,15 @@ int xc_query_page_offline_status(xc_inte
>
> int xc_exchange_page(xc_interface *xch, int domid, xen_pfn_t mfn);
>
> +/**
> + * This function deallocates all the guests memory and allocates it
> + * again and immediately, with the net effect of moving it somewhere
> + * else wrt where it is when the function is invoked.
> + *
> + * @param xch a handle to an open hypervisor interface.
> + * @param domid the domain id one wants to move the memory of.
> + */
> +int xc_domain_move_memory(xc_interface *xch, uint32_t domid/*, int hvm*/);
>
> /**
> * Memory related information, such as PFN types, the P2M table,
> diff --git a/tools/libxc/xg_private.h b/tools/libxc/xg_private.h
> --- a/tools/libxc/xg_private.h
> +++ b/tools/libxc/xg_private.h
> @@ -145,6 +145,11 @@ static inline xen_pfn_t pfn_to_mfn(xen_p
> (((uint32_t *)p2m)[(pfn)]))));
> }
>
> +static inline xen_pfn_t mfn_to_pfn(xen_pfn_t mfn, xen_pfn_t *m2p)
> +{
> + return m2p[mfn];
> +}
> +
> /* Number of xen_pfn_t in a page */
> #define FPP (PAGE_SIZE/(dinfo->guest_width))
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |