[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl



On 27/03/17 10:06, Joshua Otto wrote:
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such as interactive user input or the passage of
>   wall-clock time
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
>
> To permit users to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy as such a callback in
>   libxl and plumb it through the IPC machinery to libxc, effectively
>   maintaing the old policy for now
>
> Signed-off-by: Joshua Otto <jtotto@xxxxxxxxxxxx>

This patch should be split into two.  One modifying libxc to use struct
precopy_stats, and a second to wire up the RPC call.

> ---
>  tools/libxc/include/xenguest.h     |  23 ++++-
>  tools/libxc/xc_nomigrate.c         |   3 +-
>  tools/libxc/xc_sr_common.h         |   7 +-
>  tools/libxc/xc_sr_save.c           | 194 
> ++++++++++++++++++++++++++-----------
>  tools/libxl/libxl_dom_save.c       |  20 ++++
>  tools/libxl/libxl_save_callout.c   |   3 +-
>  tools/libxl/libxl_save_helper.c    |   7 +-
>  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
>  8 files changed, 189 insertions(+), 72 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index aa8cc8b..30ffb6f 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,14 @@
>   */
>  struct xenevtchn_handle;
>  
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> +    unsigned iteration;
> +    unsigned total_written;
> +    long dirty_count; /* -1 if unknown */

total_written and dirty_count are liable to be equal, so having them as
different widths of integer clearly can't be correct.

> +};
> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>      /* Called after expiration of checkpoint interval,
> @@ -46,6 +54,17 @@ struct save_callbacks {
>       */
>      int (*suspend)(void* data);
>  
> +    /* Called after every batch of page data sent during the precopy phase 
> of a
> +     * live migration to ask the caller what to do next based on the current
> +     * state of the precopy migration.
> +     */
> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely and
> +                                        * tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit 
> the
> +                                        * remaining dirty pages. */
> +    int (*precopy_policy)(struct precopy_stats stats, void *data);

Structures shouldn't be passed by value like this, as the compiler has
to do a lot of memcpy() work to make it happen.  You should pass by
const pointer, as (as far as I can tell), they are strictly read-only to
the implementation of this hook?

> +
>      /* Called after the guest's dirty pages have been
>       *  copied into an output buffer.
>       * Callback function resumes the guest & the device model,
> @@ -100,8 +119,8 @@ typedef enum {
>   *        doesn't use checkpointing
>   * @return 0 on success, -1 on failure
>   */
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> max_iters,
> -                   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
> +                   uint32_t flags /* XCFLAGS_xxx */,
>                     struct save_callbacks* callbacks, int hvm,
>                     xc_migration_stream_t stream_type, int recv_fd);

It would be cleaner for existing callers, and to extend in the future,
to encapsulate all of these parameters in a struct domain_save_params
and pass it by pointer to here.

That way, we'd avoid the situation we currently have where some
information is passed in bitfields in a single parameter, whereas other
booleans are passed as integers.

The hvm parameter specifically is useless, and can be removed by
rearranging the sanity checks until after the xc_domain_getinfo() call.

>  
> diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
> index 15c838f..2af64e4 100644
> --- a/tools/libxc/xc_nomigrate.c
> +++ b/tools/libxc/xc_nomigrate.c
> @@ -20,8 +20,7 @@
>  #include <xenctrl.h>
>  #include <xenguest.h>
>  
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> max_iters,
> -                   uint32_t max_factor, uint32_t flags,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> flags,
>                     struct save_callbacks* callbacks, int hvm,
>                     xc_migration_stream_t stream_type, int recv_fd)
>  {
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index b1aa88e..a9160bd 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -198,12 +198,11 @@ struct xc_sr_context
>              /* Further debugging information in the stream. */
>              bool debug;
>  
> -            /* Parameters for tweaking live migration. */
> -            unsigned max_iterations;
> -            unsigned dirty_threshold;
> -
>              unsigned long p2m_size;
>  
> +            struct precopy_stats stats;
> +            int policy_decision;
> +
>              xen_pfn_t *batch_pfns;
>              unsigned nr_batch_pfns;
>              unsigned long *deferred_pages;
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 797aec5..eb95334 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -271,13 +271,29 @@ static int write_batch(struct xc_sr_context *ctx)
>  }
>  
>  /*
> + * Test if the batch is full.
> + */
> +static bool batch_full(struct xc_sr_context *ctx)

const struct xc_sr_context *ctx

This is a predicate, after all.

> +{
> +    return ctx->save.nr_batch_pfns == MAX_BATCH_SIZE;
> +}
> +
> +/*
> + * Test if the batch is empty.
> + */
> +static bool batch_empty(struct xc_sr_context *ctx)
> +{
> +    return ctx->save.nr_batch_pfns == 0;
> +}
> +
> +/*
>   * Flush a batch of pfns into the stream.
>   */
>  static int flush_batch(struct xc_sr_context *ctx)
>  {
>      int rc = 0;
>  
> -    if ( ctx->save.nr_batch_pfns == 0 )
> +    if ( batch_empty(ctx) )
>          return rc;
>  
>      rc = write_batch(ctx);
> @@ -293,19 +309,12 @@ static int flush_batch(struct xc_sr_context *ctx)
>  }
>  
>  /*
> - * Add a single pfn to the batch, flushing the batch if full.
> + * Add a single pfn to the batch.
>   */
> -static int add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
> +static void add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
>  {
> -    int rc = 0;
> -
> -    if ( ctx->save.nr_batch_pfns == MAX_BATCH_SIZE )
> -        rc = flush_batch(ctx);
> -
> -    if ( rc == 0 )
> -        ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
> -
> -    return rc;
> +    assert(ctx->save.nr_batch_pfns < MAX_BATCH_SIZE);
> +    ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
>  }
>  
>  /*
> @@ -352,10 +361,15 @@ static int suspend_domain(struct xc_sr_context *ctx)
>   * Send a subset of pages in the guests p2m, according to the dirty bitmap.
>   * Used for each subsequent iteration of the live migration loop.
>   *
> + * During the precopy stage of a live migration, test the user-supplied
> + * policy function after each batch of pages and cut off the operation
> + * early if indicated.  Unless aborting, the dirty pages remaining in this 
> round
> + * are transferred into the deferred_pages bitmap.

Is this actually a sensible thing to do?  On iteration 0, this is going
to be a phenomenal number of RPC calls, which are all going to make the
same decision.

> + *
>   * Bitmap is bounded by p2m_size.
>   */
>  static int send_dirty_pages(struct xc_sr_context *ctx,
> -                            unsigned long entries)
> +                            unsigned long entries, bool precopy)

Shouldn't this precopy boolean be some kind of state variable in ctx ?

>  {
>      xc_interface *xch = ctx->xch;
>      xen_pfn_t p;
> @@ -364,31 +378,57 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
> -    for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
> +    int (*precopy_policy)(struct precopy_stats, void *) =
> +        ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
> +    assert(batch_empty(ctx));
> +    for ( p = 0, written = 0; p < ctx->save.p2m_size; )

This looks suspicious without an increment.  Conceptually, it might be
better as a do {} while ( decision == XGS_POLICY_CONTINUE_PRECOPY ); loop?

>      {
> -        if ( !test_bit(p, dirty_bitmap) )
> -            continue;
> +        if ( ctx->save.live && precopy )
> +        {
> +            ctx->save.policy_decision = precopy_policy(ctx->save.stats, 
> data);

Newline here please.

> +            if ( ctx->save.policy_decision == XGS_POLICY_ABORT )
> +            {

Please but a log message here indicating that abort has been requested. 
Otherwise, the migration will give up with a failure and no obvious
indication why.

> +                return -1;
> +            }
> +            else if ( ctx->save.policy_decision != 
> XGS_POLICY_CONTINUE_PRECOPY )
> +            {
> +                /* Any outstanding dirty pages are now deferred until the 
> next
> +                 * phase of the migration. */

/*
 * The comment style for multiline comments
 * is like this.
 */

> +                bitmap_or(ctx->save.deferred_pages, dirty_bitmap,
> +                          ctx->save.p2m_size);
> +                if ( entries > written )
> +                    ctx->save.nr_deferred_pages += entries - written;
> +
> +                goto done;
> +            }
> +        }
>  
> -        rc = add_to_batch(ctx, p);
> +        for ( ; p < ctx->save.p2m_size && !batch_full(ctx); ++p )
> +        {
> +            if ( test_and_clear_bit(p, dirty_bitmap) )
> +            {
> +                add_to_batch(ctx, p);
> +                ++written;
> +                ++ctx->save.stats.total_written;
> +            }
> +        }
> +
> +        rc = flush_batch(ctx);
>          if ( rc )
>              return rc;
>  
> -        /* Update progress every 4MB worth of memory sent. */
> -        if ( (written & ((1U << (22 - 12)) - 1)) == 0 )
> -            xc_report_progress_step(xch, written, entries);
> -
> -        ++written;
> +        /* Update progress after every batch (4MB) worth of memory sent. */
> +        xc_report_progress_step(xch, written, entries);
>      }
>  
> -    rc = flush_batch(ctx);
> -    if ( rc )
> -        return rc;
> -
>      if ( written > entries )
>          DPRINTF("Bitmap contained more entries than expected...");
>  
>      xc_report_progress_step(xch, entries, entries);
>  
> + done:
>      return ctx->save.ops.check_vm_state(ctx);
>  }
>  
> @@ -396,14 +436,14 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
>   * Send all pages in the guests p2m.  Used as the first iteration of the live
>   * migration loop, and for a non-live save.
>   */
> -static int send_all_pages(struct xc_sr_context *ctx)
> +static int send_all_pages(struct xc_sr_context *ctx, bool precopy)
>  {
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
>      bitmap_set(dirty_bitmap, ctx->save.p2m_size);
>  
> -    return send_dirty_pages(ctx, ctx->save.p2m_size);
> +    return send_dirty_pages(ctx, ctx->save.p2m_size, precopy);
>  }
>  
>  static int enable_logdirty(struct xc_sr_context *ctx)
> @@ -446,8 +486,7 @@ static int update_progress_string(struct xc_sr_context 
> *ctx,
>      xc_interface *xch = ctx->xch;
>      char *new_str = NULL;
>  
> -    if ( asprintf(&new_str, "Frames iteration %u of %u",
> -                  iter, ctx->save.max_iterations) == -1 )
> +    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
>      {
>          PERROR("Unable to allocate new progress string");
>          return -1;
> @@ -468,20 +507,47 @@ static int send_memory_live(struct xc_sr_context *ctx)
>      xc_interface *xch = ctx->xch;
>      xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>      char *progress_str = NULL;
> -    unsigned x;
>      int rc;
>  
> +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> +                                    &ctx->save.dirty_bitmap_hbuf);
> +
> +    int (*precopy_policy)(struct precopy_stats, void *) =
> +        ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
>      rc = update_progress_string(ctx, &progress_str, 0);
>      if ( rc )
>          goto out;
>  
> -    rc = send_all_pages(ctx);
> +#define CONSULT_POLICY                                                       
>  \

:(

The reason this code is readable and (hopefully) easy to follow, is due
in large part to a lack of macros like this trying to hide what is
actually going on.

> +    do {                                                                     
>  \
> +        if ( ctx->save.policy_decision == XGS_POLICY_ABORT )                 
>  \
> +        {                                                                    
>  \
> +            rc = -1;                                                         
>  \
> +            goto out;                                                        
>  \
> +        }                                                                    
>  \
> +        else if ( ctx->save.policy_decision != XGS_POLICY_CONTINUE_PRECOPY ) 
>  \
> +        {                                                                    
>  \
> +            rc = 0;                                                          
>  \
> +            goto out;                                                        
>  \
> +        }                                                                    
>  \
> +    } while (0)
> +
> +    ctx->save.stats = (struct precopy_stats)
> +        {
> +            .iteration     = 0,
> +            .total_written = 0,
> +            .dirty_count   = -1
> +        };
> +    rc = send_all_pages(ctx, /* precopy */ true);
>      if ( rc )
>          goto out;
>  
> -    for ( x = 1;
> -          ((x < ctx->save.max_iterations) &&
> -           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
> +    /* send_all_pages() has updated the stats */
> +    CONSULT_POLICY;
> +
> +    for ( ctx->save.stats.iteration = 1; ; ++ctx->save.stats.iteration )

Again, without an exit condition, this looks suspicious.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.