Xen project Mailing List

On Wed, Jan 25, 2012 at 3:52 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:

>

> diff -r 0446591bee86 -r 822536df4aec tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c Mon Jan 23 14:44:00 2012 -0800
> +++ b/tools/libxl/xl_cmdimpl.c Mon Jan 23 14:44:02 2012 -0800
> @@ -1814,7 +1814,7 @@
> * If we have daemonized then do not return to the caller -- this has
> * already happened in the parent.
> */
> - if ( !need_daemon )
> + if ( daemonize && !need_daemon )

What is this change for?

Sorry my bad. I should have sent it out as a separate patch. Well, if you
look at the control flow in the create_domain function , you would realize
that this is a bug.

create_domain()
{
int daemonize = dom_info->daemonize;
....
int need_daemon = daemonize;

restore_domain() or create_new()

if (!daemonize && !monitor) goto out;

if (need_daemon) {
    fork()
    daemon()..
    need_daemon = 0;
}
...
out:
         /* If we have daemonized then do not return to the caller -- this has
   * already happened in the parent.
   */
        if ( !need_daemon )
         exit()

}

So, even if one explicitly sets daemonize to 0, the function will never return to the
caller. I needed it to return to the caller (remus_receiver() ), to take further actions.

> exit(ret);
>
> return ret;
> @@ -5853,6 +5853,175 @@
> return ret;
> }
>
> +int main_remus_receive(int argc, char **argv)

Can this not be done as a flag to migrate-receive? Likewise is there
common code between the remus migrate and regular migration?

It can but it would basically terminate after the create_domain() call in the
migrate_receive function. I would have to have something like
if (remus) {
   dont wait for permission to start domain
   try renaming domain
   unpause domain
   return
}

and a bunch of if (remus) flags peppered around the migrate receive function.

Having it in a separate function allows me to add support for creating a TCP
channel that receives the checkpoints, add hooks or call external libraries/binaries
that allows the user to check for split-brain before resuming the domain.

Is there any reason that remus would not benefit from the xl migration
protocol preamble?

No specific reason.

> +{
> + int rc;
> + char *migration_domname;
> + struct domain_create dom_info;
> +
> + signal(SIGPIPE, SIG_IGN);
> + memset(&dom_info, 0, sizeof(dom_info));
> + dom_info.debug = 1;
> + dom_info.no_incr_generationid = 1;
> + dom_info.restore_file = "incoming checkpoint stream";
> + dom_info.migrate_fd = 0; /* stdin - will change in future*/
> + dom_info.migration_domname_r = &migration_domname;
> +
> + rc = create_domain(&dom_info);

I'm totally failing to see where the Remus'ness of this create_domain
comes from.

dom_info.daemonize = 0, .monitor = 0 and .paused = 0;

As I said earlier, this part is probably the only duplication between migrate-receive
and remus-receive. With Xend, there was no separate remus receiver, to control
what happens on the backup host. Now that I have an opportunity to incorporate
remus into the toolstack from start, I thought I will keep the remus control flow as separate as possible
and not have to worry about breaking live migration (nor complicate its code).

> + if (rc < 0) {
> + fprintf(stderr, "migration target (Remus): Domain creation failed"
> + " (code %d) domid %u.\n", rc, domid);
> + exit(-rc);
> + }
> +
> + /* If we are here, it means that the sender (primary) has crashed.
> + * If domain renaming fails, lets just continue (as we need the domain
> + * to be up & dom names may not matter much, as long as its reachable
> + * over network).
> + *
> + * If domain unpausing fails, destroy domain ? Or is it better to have
> + * a consistent copy of the domain (memory, cpu state, disk)
> + * on atleast one physical host ? Right now, lets just leave the domain

at least

> + * as is and let the Administrator decide (or troubleshoot).
> + */
> + fprintf(stderr, "migration target: Remus Failover for domain %u\n", domid);
> + if (migration_domname) {
> + rc = libxl_domain_rename(ctx, domid, migration_domname,
> + common_domname);
> + if (rc)
> + fprintf(stderr, "migration target (Remus): "
> + "Failed to rename domain from %s to %s:%d\n",
> + migration_domname, common_domname, rc);
> +
> + rc = libxl_domain_unpause(ctx, domid);
> + if (rc)
> + fprintf(stderr, "migration target (Remus): "
> + "Failed to unpause domain %s (id: %u):%d\n",
> + common_domname, domid, rc);
> + }
> +
> + return -rc;
> +}
> +
> +int main_remus(int argc, char **argv)
> +{
> + int opt, rc;
> + const char *ssh_command = "ssh";
> + char *host = NULL, *rune = NULL, *domain = NULL;
> +
> + int sendpipe[2], recvpipe[2];
> + int send_fd = -1, recv_fd = -1;
> + pid_t child = -1;
> +
> + uint8_t *config_data;
> + int config_len;
> +
> + libxl_domain_remus_info r_info;
> +
> + memset(&r_info, 0, sizeof(libxl_domain_remus_info));
> + /* Defaults */
> + r_info.interval = 200;
> + r_info.blackhole = 0;
> + r_info.compression = 1;
> +
> + while ((opt = def_getopt(argc, argv, "bui:s:", "remus", 2)) != -1) {

main_migrate handles a bunch of other options which might also be
interesting in the Remus case? Better would be to add all this as an
option to the std migrate.

Negative. It would look totally unintuitive to have a checkpoint interval option (or blackhole
replication option) in a live migration command. To an end user, remus is just a HA system
and live migration is for moving VMs around. What is the point in trying to educate him/her
that remus is basically "continuous live migration" ?

Imagine a command like
"xl migrate --R --i 100 --N Guest Backup"
and help options like
"-R enable remus HA
-i remus checkpoint interval
-N disable network buffering in Remus"

To you and me, as devs, it may not matter. But to common users who are mostly going to
use just live migration, this would be utterly confusing.

> + switch (opt) {
> + case 0: case 2:
> + return opt;
> +
> + case 'i':
> + r_info.interval = atoi(optarg);
> + break;
> + case 'b':
> + r_info.blackhole = 1;
> + break;
> + case 'u':
> + r_info.compression = 0;
> + break;
> + case 's':
> + ssh_command = optarg;
> + break;
> + }
> + }
> +
> + domain = argv[optind];
> + host = argv[optind + 1];
> +
> + if (r_info.blackhole) {
> + find_domain(domain);
> + send_fd = open("/dev/null", O_RDWR, 0644);
> + if (send_fd < 0) {
> + perror("failed to open /dev/null");
> + exit(-1);
> + }
> + } else {

The following duplicates an awful lot of main_migrate which begs for
them to either be the same underlying command or at least for some
refactoring.

I agree.

> +
> + if (!ssh_command[0]) {
> + rune = host;
> + } else {
> + if (asprintf(&rune, "exec %s %s xl remus-receive",
> + ssh_command, host) < 0)
> + return 1;
> + }
> +
> + save_domain_core_begin(domain, NULL, &config_data, &config_len);
> +
> + if (!config_len) {
> + fprintf(stderr, "No config file stored for running domain and "
> + "none supplied - cannot start remus.\n");
> + exit(1);
> + }
> +
> + MUST( libxl_pipe(ctx, sendpipe) );
> + MUST( libxl_pipe(ctx, recvpipe) );
> +
> + child = libxl_fork(ctx);
> + if (child==-1) exit(1);
> +
> + /* TODO: change this to plain TCP socket based channel
> + * instead of SSH.

There are good reasons for using ssh though. However the user can supply
another command which is not encrypted if they want to.

Whatever happens here the same arguments would apply to non-remus
migration so the changes should be done for both (or better the code
should be made more common.

Ian.

Re: [Xen-devel] [PATCH 3 of 3] tools/libxl: xl remus/remus-receive commands