[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How does shadow page table work during migration?

  • To: Kevin Negy <kevinnegy@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 19 Feb 2021 20:17:18 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3nb53cQ4PWXD69FuwHnuTzl6UylkUtvoW95r9GrLf4Q=; b=AxL1IgA0DvEb51WSFmU1Whe3oAgZlU5mmINBKs9ILJa3EMYnvyUG3XQ6kvfGLPse9+49ye+OVq8pgTpedRGNtCAEdSwBQR6ibmUegbrj0Oq9cS73kpXrVvg2NvvEpfb8B6IjNeTwQAxqsnX0NxXq0+z3Luu0vjURuYGybdpExO1iNucsxv5rR8/+4pj0uLxCTCAM2QDQp2xtx1xkQWEHY2hTKpouPTL0Os+T5y3hj2Jet56VT2eoEMMJYemCfDw1jNetNGSCqLtIFnPXwFq5TAPjE6KDylgTwiBMgm0jgk2hxmCruDWswpyi+RnibujPLratffoSPvjNRVbe5ZByNw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Jr83w5uYvuF8Q68kD+WyD5T2q0KXJ8Jsbrwdxm1x5D/8uO66ZokcMsIEcAdtSiqzS83M5XzpnSVmdjWkQt1xiBDhNlNsqAdPUvHrtg7DA9IiR203aPXKdKkWgUAMYPm1WGC+8M9UhDurEl9/lDptspP4SA6i+NQIVqNPc/eiXut/kP8qRGeHnOqnfcHfW6wrn90+7H9dTtTUkKscDqzBY+k/u7Ew/9Q5GpoHSjSfXbB3MIJe5Vl7J7TDElAzCXQzSbso550Qlu78/SllPYnrMFuQp74Fp3kcWTPtQynZLjcA55a0rIPeviDr0VHpmJnbVQI2cjTb+H3OXuZKFR73iQ==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Delivery-date: Fri, 19 Feb 2021 20:17:35 +0000
  • Ironport-sdr: 92RGrAAT3xB2WPHfPxUWVcYOQ3Tf6nDdVSClala0xhY14knOm86SUbMY4DWyZxW4FfxXLI8cst eLivrXFRp/+mqArP9p5l/YHcg07P5COwuKuOcn6ZPCyCfcepna+riH5+fAyyH5pu7vh/LSIHhn eoVgdXfME1oJzc+bW/LrTZTWE4OxGKO4c3f32e9k+EGWVSdqJUY6bkPW4eX5uleGl/LUNptRHH EW+1C5zd4H47WJ6D5fNDqDRaynTHSVMmo+HgVn3i2rO25I2vHfwk21/sLXHWlSjcaGgQytbj+x BHE=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19/02/2021 16:10, Kevin Negy wrote:
> Hello,
> I'm trying to understand how the shadow page table works in Xen,
> specifically during live migration. My understanding is that after
> shadow paging is enabled (sh_enable_log_dirty() in
> xen/arch/x86/mm/shadow/common.c), a shadow page table is created,
> which is a complete copy of the current guest page table. Then the CR3
> register is switched to use this shadow page table as the active table
> while the guest page table is stored elsewhere. The guest page table
> itself (and not the individual entries in the page table) is marked as
> read only so that any guest memory access that requires the page table
> will result in a page fault. These page faults happen and are trapped
> to the Xen hypervisor. Xen will then update the shadow page table to
> match what the guest sees on its page tables.
> Is this understanding correct?
> If so, here is where I get confused. During the migration pre-copy
> phase, each pre-copy iteration reads the dirty bitmap
> (paging_log_dirty_op() in xen/arch/x86/mm/paging.c) and cleans it.
> This process seems to destroy all the shadow page tables of the domain
> with the call to shadow_blow_tables() in sh_clean_dirty_bitmap().
> How is the dirty bitmap related to shadow page tables? Why destroy the
> entire shadow page table if it is the only legitimate page table in
> CR3 for the domain?


Different types of domains use shadow pagetables in different ways, and
the interaction with migration is also type-dependent.

HVM guests use shadow (or HAP) as a fixed property from when they are
created.  Migrating an HVM domain does not dynamically affect whether
shadow is active.  PV guests do nothing by default, but do turn shadow
on dynamically for migration purposes.

Whenever shadow is active, guests do not have write access to their
pagetables.  All updates are emulated if necessary, and "the shadow
pagetables" are managed entirely by Xen behind the scenes.

Next, is the shadow memory pool.  Guests can have an unbounded quantity
of pagetables, and certain pagetable structures take more memory
allocations to shadow correctly than the quantity of RAM expended by the
guest constructing the structure in the first place.

Obviously, Xen can't be in a position where it is forced to expend more
memory for shadow pagetables than the RAM allocated to the guest in the
first place.  What we do is have a fixed sized memory pool (choosable
when you create the domain - see the shadow_memory vm parameter) and
recycle shadows on a least-recently-used basis.

In practice, this means that Xen never has all of the guest pagetables
shadowed at once.  When a guest moves off the pagetables which are
currently shadowed, a pagefault occurs and Xen shadows the new address
by recycling a pagetable which hasn't been used for a while.  The
shadow_blow_tables() call is "please recycle everything" which is used
to throw away all shadow pagetables, which in turn will cause the
shadows to be recreated from scratch as the guest continues to run.

Next, to the logdirty bitmap.  The logdirty bitmap itself is fairly easy
- it is one bit per 4k page (of guest physical address space) indicating
whether that page has been written to, since the last time we checked.

What is complicated is tracking writes, and understand why, it is
actually easier to consider the HVM HAP (i.e. non-shadow) case.  Here,
we have a Xen-maintained single set of EPT or NPT pagetables, which map
the guest physical address space.

When we turn on logdirty, we pause the VM temporarily, and mark all
guest RAM as read-only.  (Actually, we have a lazy-propagation mechanism
of this read-only-ness so we don't spend seconds of wallclock time with
large VMs paused while we make this change.)  Then, as the guest
continues to execute, it exits to Xen when a write hits a read-only
mapping.  Xen responds by marking this frame in the logdirty bitmap,
then remapping it as read-write, then letting the guest continue.

Shadow pagetables are more complicated.  With HAP, hardware helps us
maintain the guest virtual and guest physical address spaces in
logically separate ways, which eventually become combined in the TLBs. 
With Shadow, Xen has to do the combination of address spaces itself -
the shadow pagetables map guest virtual to host physical address.

Suddenly, "mark all guest RAM as read-write" isn't trivial.  The logical
operation you need is: for the shadows we have, uncombine the two
logical addresses spaces, and for the subset which map guest RAM, change
from read-write to read-only, then recombine.  The uncombine part is
actually racy, and involves reversing a one-way mapping, so is
exceedingly expensive.

It is *far* easier to just throw everything away and re-shadow from
scratch, when we want to start tracking writes.

Anyway - I hope this is informative.  It is accurate to the best of my
knowledge, but it also written off the top of my head.  In some copious
free time, I should see about putting some Sphinx docs together for it.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.