[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RFC Patch v2 00/45] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
Virtual machine (VM) replication is a well known technique for providing application-agnostic software-implemented hardware fault tolerance - "non-stop service". Currently, remus provides this function, but it buffers all output packets, and the latency is unacceptable. In xen summit 2012, We introduce a new VM replication solution: colo (COarse-grain LOck-stepping virtual machine). The presentation is in the following URL: http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service Here is the summary of the solution: >From the client's point of view, as long as the client observes identical responses from the primary and secondary VMs, according to the service semantics, then the secondary vm is a valid replica of the primary vm, and can successfully take over when a hardware failure of the primary vm is detected. This patchset is RFC, and implements the framework and disk replication of COLO: 1. Both primary vm and secondary vm are running 2. do checkoint 3. disk replication(use blktap2) This patchset is based on remus-v18, and use migration v1. Only supports hvm guest now. TODO list: 1. Use migration v2 to implement COLO 3. nic replication 4. support pvm Patch 1-3 : bugfix Patch 4-11 : update some APIs which will be used by colo Patch 12-15: temporarily update remus to reuse remus device codes Patch 16-23: COLO framework related codes Patch 24 : Hack patch, just for test Patch 25-34: bugfix for blktap2 Patch 35-38: move some block-remus's codes to block-replication.c. These codes will be reused by COLO. Patch 39 : implement block-colo Patch 40-43: update libxl to support blktap2 Patch 44 : implement disk replication Patch 45 : hypervisor bugfix. We find this bug before rebasing colo to newest xen. But we don't trigger this bug now. Patch 46 : A patch for qemu-xen Changlog from v1 to v2: 1. rebase to newest remus 2. add disk replication support Hong Tao (1): copy the correct page to memory Lai Jiangshan (1): colo: dynamic allocate aio_requests to avoid -EBUSY error Wen Congyang (43): csum the correct page don't zero out ioreq page Refactor domain_suspend_callback_common() Update libxl__domain_resume() for colo Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Introduce a new internal API libxl__domain_unpause() Update libxl__domain_unpause() to support qemu-xen support to resume uncooperative HVM guests update datecopier to support sending data only introduce a new API to aync read data from fd move remus related codes to libxl_remus.c rename remus device to checkpoint device adjust the indentation don't touch remus in checkpoint_device Update libxl_save_msgs_gen.pl to support return data from xl to xc Allow slave sends data to master secondary vm suspend/resume/checkpoint code primary vm suspend/get_dirty_pfn/resume/checkpoint code xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode COLO: xc related codes send store mfn and console mfn to xl before resuming secondary vm implement the cmdline for COLO HACK: do checkpoint per 20ms fix memory leak in block-remus pass uuid to the callback td_open return the correct dev path blktap2: use correct way to get remus_image don't call client_flush() when switching to unprotected mode remus: fix bug in tdremus_close() blktap2: use correct way to get free event id blktap2: don't return negative event id blktap2: use correct way to define array. blktap2: connect to backup asynchronously switch to unprotected mode before closing blktap2: move async connect related codes to block-replication.c blktap2: move ramdisk related codes to block-replication.c block-colo: implement colo disk replication pass correct file to qemu if we use blktap2 support blktap remus in xl support blktap colo in xl: update libxl__device_disk_from_xs_be() to support blktap device libxl/colo: setup and control disk replication for blktap2 backends x86/hvm: Always set pending event injection when loading VMC[BS] state. docs/man/xl.pod.1 | 11 +- tools/blktap2/drivers/Makefile | 5 +- tools/blktap2/drivers/block-aio.c | 41 +- tools/blktap2/drivers/block-cache.c | 4 +- tools/blktap2/drivers/block-colo.c | 1151 ++++++++++++++++++ tools/blktap2/drivers/block-log.c | 4 +- tools/blktap2/drivers/block-qcow.c | 5 +- tools/blktap2/drivers/block-ram.c | 5 +- tools/blktap2/drivers/block-remus.c | 1266 +++++--------------- tools/blktap2/drivers/block-replication.c | 1116 +++++++++++++++++ tools/blktap2/drivers/block-replication.h | 217 ++++ tools/blktap2/drivers/block-vhd.c | 5 +- tools/blktap2/drivers/scheduler.c | 33 +- tools/blktap2/drivers/tapdisk-control.c | 17 +- tools/blktap2/drivers/tapdisk-disktype.c | 21 +- tools/blktap2/drivers/tapdisk-disktype.h | 3 +- tools/blktap2/drivers/tapdisk-interface.c | 21 +- tools/blktap2/drivers/tapdisk-interface.h | 1 + tools/blktap2/drivers/tapdisk-vbd.c | 9 + tools/blktap2/drivers/tapdisk-vbd.h | 1 + tools/blktap2/drivers/tapdisk.h | 3 +- tools/libxc/xc_domain_restore.c | 74 +- tools/libxc/xc_domain_save.c | 66 +- tools/libxc/xc_resume.c | 20 +- tools/libxc/xenguest.h | 40 + tools/libxl/Makefile | 5 +- tools/libxl/libxl.c | 148 ++- tools/libxl/libxl.h | 3 +- tools/libxl/libxl_aoutils.c | 81 +- tools/libxl/libxl_blktap2.c | 35 + ...xl_remus_device.c => libxl_checkpoint_device.c} | 221 ++-- tools/libxl/libxl_colo.h | 48 + tools/libxl/libxl_colo_restore.c | 878 ++++++++++++++ tools/libxl/libxl_colo_save.c | 628 ++++++++++ tools/libxl/libxl_colo_save_disk_blktap2.c | 216 ++++ tools/libxl/libxl_create.c | 138 ++- tools/libxl/libxl_device.c | 6 +- tools/libxl/libxl_dm.c | 20 +- tools/libxl/libxl_dom.c | 565 ++++----- tools/libxl/libxl_internal.h | 309 +++-- tools/libxl/libxl_netbuffer.c | 127 +- tools/libxl/libxl_noblktap2.c | 35 + tools/libxl/libxl_nonetbuffer.c | 14 +- tools/libxl/libxl_qmp.c | 10 + tools/libxl/libxl_remus.c | 335 ++++++ tools/libxl/libxl_remus.h | 27 + tools/libxl/libxl_remus_disk_drbd.c | 67 +- tools/libxl/libxl_save_callout.c | 37 +- tools/libxl/libxl_save_helper.c | 17 + tools/libxl/libxl_save_msgs_gen.pl | 74 +- tools/libxl/libxl_types.idl | 14 +- tools/libxl/libxl_utils.c | 23 + tools/libxl/libxl_utils.h | 1 + tools/libxl/libxlu_disk_l.l | 2 + tools/libxl/xl_cmdimpl.c | 54 +- tools/libxl/xl_cmdtable.c | 3 +- xen/arch/x86/hvm/svm/svm.c | 16 +- xen/arch/x86/hvm/vmx/vmx.c | 25 +- 58 files changed, 6558 insertions(+), 1763 deletions(-) create mode 100644 tools/blktap2/drivers/block-colo.c create mode 100644 tools/blktap2/drivers/block-replication.c create mode 100644 tools/blktap2/drivers/block-replication.h rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (41%) create mode 100644 tools/libxl/libxl_colo.h create mode 100644 tools/libxl/libxl_colo_restore.c create mode 100644 tools/libxl/libxl_colo_save.c create mode 100644 tools/libxl/libxl_colo_save_disk_blktap2.c create mode 100644 tools/libxl/libxl_remus.c create mode 100644 tools/libxl/libxl_remus.h -- 1.9.3 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |