[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support



On 27/03/2017 10:06, Joshua Otto wrote:
> Hi,
>
> We're a team of three fourth-year undergraduate software engineering students 
> at
> the University of Waterloo in Canada.  In late 2015 we posted on the list [1] 
> to
> ask for a project to undertake for our program's capstone design project, and
> Andrew Cooper pointed us in the direction of the live migration implementation
> as an area that could use some attention.  We were particularly interested in
> post-copy live migration (as evaluated by [2] and discussed on the list at 
> [3]),
> and have been working on an implementation of this on-and-off since then.
>
> We now have a working implementation of this scheme, and are submitting it for
> comment.  The changes are also available as the 'postcopy' branch of the 
> GitHub
> repository at [4]
>
> As a brief overview of our approach:
> - We introduce a mechanism by which libxl can indicate to the libxc stream
>   helper process that the iterative migration precopy loop should be 
> terminated
>   and postcopy should begin.
> - At this point, we suspend the domain, collect the final set of dirty pfns 
> and
>   write these pfns (and _not_ their contents) into the stream.
> - At the destination, the xc restore logic registers itself as a pager for the
>   migrating domain, 'evicts' all of the pfns indicated by the sender as
>   outstanding, and then resumes the domain at the destination.
> - As the domain executes, the migration sender continues to push the remaining
>   oustanding pages to the receiver in the background.  The receiver
>   monitors both the stream for incoming page data and the paging ring event
>   channel for page faults triggered by the guest.  Page faults are forwarded 
> on
>   the back-channel migration stream to the migration sender, which prioritizes
>   these pages for transmission.
>
> By leveraging the existing paging API, we are able to implement the postcopy
> scheme without any hypervisor modifications - all of our changes are confined 
> to
> the userspace toolstack.  However, we inherit from the paging API the
> requirement that the domains be HVM and that the host have HAP/EPT support.

Wow.  Considering that the paging API has had no in-tree consumers (and
its out-of-tree consumer folded), I am astounded that it hasn't bitrotten.

>
> We haven't yet had the opportunity to perform a quantitative evaluation of the
> performance trade-offs between the traditional pre-copy and our post-copy
> strategies, but intend to.  Informally, we've been testing our implementation 
> by
> migrating a domain running the x86 memtest program (which is obviously a
> tremendously write-heavy workload), and have observed a substantial reduction 
> in
> total time required for migration completion (at the expense of a visually
> obvious 'slowdown' in the execution of the program).

Do you have any numbers, even for this informal testing?

>   We've also noticed that,
> when performing a postcopy without any leading precopy iterations, the time
> required at the destination to 'evict' all of the outstanding pages is
> substantial - possibly because there is no batching mechanism by which pages 
> can
> be evicted - so this area in particular might require further attention.
>
> We're really interested in any feedback you might have!

Do you have a design document for this?  The spec modifications and code
comments are great, but there is no substitute (as far as understanding
goes) for a description in terms of the algorithm and design choices.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.