[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] remus: implement remus replicated checkpointing disk
On Tue, Feb 25, 2014 at 6:53 PM, Lai Jiangshan <laijs@xxxxxxxxxxxxxx> wrote: This patch implements remus replicated checkpointing disk. Hi sorry for the delayed response. And thanks a lot for this initiative. Apart from the inline feedback,
there are a few things to consider first before going down this route. 1. The drbd kernel module required for Remus is still out of tree, currently hosted on a wiki page. The drbd folks didnt want to include the changes into their code unfortunately, as they were offering the
same functionality to one of their paid customers. This is what they told me back in 2011 or so. To streamline the storage replication module installation, is there a chance of hosting the code in
xen.org's repos? That way, we could script the download and installation process. Like the qemu stuff. 2. The tapdisk based replication unfortunately is outdated. Please correct me if I have got this wrong.
Haven't we decided to get rid of blktap2 and go with the qemu disk models? In which case, the tapdisk remus code has to be ported into some qemu disk variant. Without getting a resolution to the above two, my stance is that we shouldn't pollute xl with functionality
that requires out-of-band modules that may prove pretty painful to install for the majority of folks out there. Based on the experience from the last 3 years, most average users of Remus tend to skip disk replication
altogether. They install the distro's default drbd, use the disk replication provided with it and then complain that Remus crashes or fails. Some have ventured into tapdisk replication but it unfortunately seemed to
get difficult as xend/blktap2 started getting deprecated. --- So I think this part will also require some autoconf based stuff. Especially, if DRBD & tapdisk are not present, then this whole thing gets disabled. Just like the libxl_netbuffer and libxl_nonetbuffer
Given that both of these (netbuffer and disk) are associated with Remus and both are required for Remus to work "correctly", we might as well have noremus.c and remus.c . Ofcourse it can be modularized a bit to have
netbuffer but no disk replication or vice versa. As long as the person installing or compiling this stuff is made to state explicitly that he/she does not want Remus, but only a subset of its functionality for some other purpose.
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c /* The domain was suspended successfully. Start a new network Bug. This should go before the resume call. Also, I would suggest changing the comment to something more meaningful, e.g., "commit disk changes.." return 1; Now might be a good time to use the restore callbacks offered by the toolstack to get an explicit ack from the backup that it has received the memory checkpoint too
before the network buffer is released. I think I put in a comment related to that somewhere. if (remus_state->netbuf_state) { I would also suggest renaming these to something else, not associated with suspend but associated with checkpoints. start_disk_sync, finish_disk_sync, start_new_epoch or something along those lines.
+ /* setup & teardown */ +} libxl__remus_disk_type; We don't need to run any scripts for DRBD (or tapdisk for that matter). DRBD scripts will get activated when the domain boots and thats the end of it.
On the backup side, it gets activated during the initial phase of Remus, which is same as live migration. Since xl already supports live migration with drbd based disks, we don't need any script related code at all.
With regard to tapdisk-remus (atleast with blktap2), you cant boot the domain fully unless you start Remus too. This in turn forces the backup to start the tapdisk-remus receiving end. Once again in this case, in Xend, the live migration
infrastructure did all the script setup work. + GCNEW(drbd); I don't know if this is needed at all, given that we don't have disk script setup issues. +int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |