[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support



On Wed, Mar 29, 2017 at 11:50:52PM +0100, Andrew Cooper wrote:
> On 27/03/2017 10:06, Joshua Otto wrote:
> > Hi,
> >
> > We're a team of three fourth-year undergraduate software engineering 
> > students at
> > the University of Waterloo in Canada.  In late 2015 we posted on the list 
> > [1] to
> > ask for a project to undertake for our program's capstone design project, 
> > and
> > Andrew Cooper pointed us in the direction of the live migration 
> > implementation
> > as an area that could use some attention.  We were particularly interested 
> > in
> > post-copy live migration (as evaluated by [2] and discussed on the list at 
> > [3]),
> > and have been working on an implementation of this on-and-off since then.
> >
> > We now have a working implementation of this scheme, and are submitting it 
> > for
> > comment.  The changes are also available as the 'postcopy' branch of the 
> > GitHub
> > repository at [4]
> >
> > As a brief overview of our approach:
> > - We introduce a mechanism by which libxl can indicate to the libxc stream
> >   helper process that the iterative migration precopy loop should be 
> > terminated
> >   and postcopy should begin.
> > - At this point, we suspend the domain, collect the final set of dirty pfns 
> > and
> >   write these pfns (and _not_ their contents) into the stream.
> > - At the destination, the xc restore logic registers itself as a pager for 
> > the
> >   migrating domain, 'evicts' all of the pfns indicated by the sender as
> >   outstanding, and then resumes the domain at the destination.
> > - As the domain executes, the migration sender continues to push the 
> > remaining
> >   oustanding pages to the receiver in the background.  The receiver
> >   monitors both the stream for incoming page data and the paging ring event
> >   channel for page faults triggered by the guest.  Page faults are 
> > forwarded on
> >   the back-channel migration stream to the migration sender, which 
> > prioritizes
> >   these pages for transmission.
> >
> > By leveraging the existing paging API, we are able to implement the postcopy
> > scheme without any hypervisor modifications - all of our changes are 
> > confined to
> > the userspace toolstack.  However, we inherit from the paging API the
> > requirement that the domains be HVM and that the host have HAP/EPT support.
> 
> Wow.  Considering that the paging API has had no in-tree consumers (and
> its out-of-tree consumer folded), I am astounded that it hasn't bitrotten.

Well, there's tools/xenpaging, which was a helpful reference when
putting this together.  The user-space pager actually has rotted a bit
(I'm fairly certain the VM event ring protocol has changed subtly under
its feet), so I also needed to consult tools/xen-access to get things
right.

> 
> >
> > We haven't yet had the opportunity to perform a quantitative evaluation of 
> > the
> > performance trade-offs between the traditional pre-copy and our post-copy
> > strategies, but intend to.  Informally, we've been testing our 
> > implementation by
> > migrating a domain running the x86 memtest program (which is obviously a
> > tremendously write-heavy workload), and have observed a substantial 
> > reduction in
> > total time required for migration completion (at the expense of a visually
> > obvious 'slowdown' in the execution of the program).
> 
> Do you have any numbers, even for this informal testing?

We have a much more ambitious test matrix planned, but sure, here's an
early encouraging set of measurements - for a domain with 2GB of memory
and a 256MB writable working set (the application driving the writes
being fio submitting writes against a ramdisk), we measured these times:

                    Pre-copy + Stop-and-copy |  1 precopy iteration +
                             (s)             |       postcopy (s)
                   --------------------------+-------------------------
 Precopy Duration:           66.97           |         44.44
 Suspend Duration:            6.807          |          3.23
Postcopy Duration:            N/A            |          4.83

However...

That 3.23s suspend for the hybrid migration seems too high, doesn't it?

There's currently a serious performance bug that we're still trying to
work out in the case of pure-postcopy migrations, with no leading
precopy.  Attempting a pure postcopy migration when running the
experiment above yields:

                     Pure postcopy (s)
                   ----------------------
 Precopy Duration:           0
 Suspend Duration:          21.93
Postcopy Duration:          44.22

Although the postcopy scheme clearly works, it takes 21.93s (!) to
unpause the guest at the destination.  The eviction of the unmigrated
pages completes in a second or two because of the lack of batching
support (still bad, but not this bad) - the holdup is somewhere on the
domain creation sequence between domcreate_stream_done() and
domcreate_complete().

I suspect that this is the result of a bad interaction between QEMU's
startup sequence (its foreign memory mapping behaviour in particular)
and the postcopy paging.  Specifically: the paging ring has room only
for 8 requests at a time.  When QEMU attempts to map a large range, the
range gets postcopy-faulted over synchronously in batches of 8 pages at
a time, and each such batch implies a synchronous copy of its pages
over the network (and the 100us xenforeignmemory_map() retry timer)
before the next batch can begin.

If I am able to confirm that this is the case, a sensible solution would
seem to be supporting paging range-population requests (i.e. a new
paging ring request type for a _range_ of gfns).  In the mean time, you
should expect to observe this effect as well in experiments.  It appears
to be largely (but not completely) mitigated by performing a single
pre-copy iteration first.

> 
> >   We've also noticed that,
> > when performing a postcopy without any leading precopy iterations, the time
> > required at the destination to 'evict' all of the outstanding pages is
> > substantial - possibly because there is no batching mechanism by which 
> > pages can
> > be evicted - so this area in particular might require further attention.
> >
> > We're really interested in any feedback you might have!
> 
> Do you have a design document for this?  The spec modifications and code
> comments are great, but there is no substitute (as far as understanding
> goes) for a description in terms of the algorithm and design choices.

As I replied to Wei, not yet, but we'd happily prepare one for v2.

Thanks!

Josh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.