Xen project Mailing List

Re: [Xen-devel] [PATCH 3 of 3] tools/libxl: xl remus/remus-receive commands

On Thu, 2012-01-26 at 00:09 +0000, Shriram Rajagopalan wrote: > On Wed, Jan 25, 2012 at 3:52 AM, Ian Campbell > > Sorry my bad. I should have sent it out as a separate patch. Well, if > you look at the control flow in the create_domain function , you would > realize that this is a bug. > > create_domain() > { > int daemonize = dom_info->daemonize; > .... > int need_daemon = daemonize; > > restore_domain() or create_new() > > if (!daemonize && !monitor) goto out; > > if (need_daemon) { > fork() > daemon().. > need_daemon = 0; > } > ... > out: > /* If we have daemonized then do not return to the caller -- this has > * already happened in the parent. > */ > if ( !need_daemon ) > exit() > > } > > > So, even if one explicitly sets daemonize to 0, the function will never > return to the > caller. I needed it to return to the caller (remus_receiver() ), to take > further actions. Makes sense. Although the default for remus-receive should be to daemonize otherwise when the domain is resumed you won't get any automatic reboot/shutdown handling etc. > > exit(ret); > > > > return ret; > > @@ -5853,6 +5853,175 @@ > > return ret; > > } > > > > +int main_remus_receive(int argc, char **argv) > > > Can this not be done as a flag to migrate-receive? Likewise is > there common code between the remus migrate and regular > migration? > > > It can but it would basically terminate after the create_domain() call > in the migrate_receive function. I would have to have something like > if (remus) { > dont wait for permission to start domain > try renaming domain > unpause domain > return > } I think actually it becomes if (!remus) wait for permission to start domain try renaming domain unpause domain However I take your point that there isn't that much in common after create_domain once you remove the post migration interlock protocol. > and a bunch of if (remus) flags peppered around the migrate receive > function. > > Having it in a separate function allows me to add support for creating > a TCP channel that receives the checkpoints, If this is useful for Remus then I can't see any reason it wouldn't also be useful for regular migration. > This add hooks or call external libraries/binaries that allows the > user to check for split-brain before resuming the domain. Hooks are OK, if they are not used by standard migration they can be NULL. They should of course be properly documented... > > +{ > > + int rc; > > + char *migration_domname; > > + struct domain_create dom_info; > > + > > + signal(SIGPIPE, SIG_IGN); > > + memset(&dom_info, 0, sizeof(dom_info)); > > + dom_info.debug = 1; BTW, this should be an option. > > + dom_info.no_incr_generationid = 1; > > + dom_info.restore_file = "incoming checkpoint stream"; > > + dom_info.migrate_fd = 0; /* stdin - will change in > future*/ > > + dom_info.migration_domname_r = &migration_domname; > > + > > + rc = create_domain(&dom_info); > > > I'm totally failing to see where the Remus'ness of this > create_domain comes from. > dom_info.daemonize = 0, .monitor = 0 and .paused = 0; And how do those options combine to == Remus? migrate doesn't currently have a "paused" option but it is not beyond the realms of possibility that such a thing might be desirable in the future. > As I said earlier, this part is probably the only duplication between > migrate-receive and remus-receive. With Xend, there was no separate > remus receiver, to control what happens on the backup host. Now that I > have an opportunity to incorporate remus into the toolstack from > start, I thought I will keep the remus control flow as separate as > possible and not have to worry about breaking live migration (nor > complicate its code). I think this is the wrong approach and motivation. We want to combine the two as much as possible to reduce the amount of Remus specific code which requires maintenance. I don't think supporting Remus via the migration code paths will unduly complicate things. [...] > > + while ((opt = def_getopt(argc, argv, "bui:s:", "remus", > 2)) != -1) { > > > main_migrate handles a bunch of other options which might also > be > interesting in the Remus case? Better would be to add all this > as an option to the std migrate. > > > Negative. It would look totally unintuitive to have a checkpoint > interval option (or blackhole replication option) in a live migration > command. To an end user, remus is just a HA system and live migration > is for moving VMs around. What is the point in trying to educate > him/her that remus is basically "continuous live migration" ? You've convinced me on the user interface point. However I think in terms of implementation we should aim for as much common code as possible. Likewise even if xl remus and xl migrate are separate commands they should support the same common options where appropriate (e.g. -F,-d,-e, possible -C etc). I'm undecided where *-receive (which is not user visible) should be separate or not -- this is the sort of thing which could in principle be negotiated as part of the protocol. Based on the above it's not clear how much code sharing there would actually be. I expect we can revisit this once the bits which can should be shared actually are. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.