[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v4 1/5] remus: don't call stream_continue() when doing failover



stream_continue() is used for migration to read emulator
xenstore data and emulator context. For remus, if we do
failover, we have read it in the checkpoint cycle, and
we only need to complete the stream.

Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
 tools/libxl/libxl_stream_read.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..24305f4 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -101,6 +101,19 @@
  *    - stream_write_emulator_done()
  *    - stream_continue()
  *
+ * 4) Failover for remus
+ *    - we buffer all records until a CHECKPOINT_END record is received
+ *    - we will use the records when a CHECKPOINT_END record is received
+ *    - if we find some internal error, the rc or retval is not 0 in
+ *      libxl__xc_domain_restore_done(). In this case, we don't resume the
+ *      guest
+ *    - if we need to do failover from primary, the rc and retval are 0
+ *      in libxl__xc_domain_restore_done(). In this case, the buffered state
+ *      will be dropped, because we don't receive a CHECKPOINT_END record,
+ *      and it is a inconsistent state. In libxl__xc_domain_restore_done(),
+ *      we just complete the stream and stream->completion_callback() will
+ *      be called to resume the guest
+ *
  * Depending on the contents of the stream, there are likely to be several
  * parallel tasks being managed.  check_all_finished() is used to join all
  * tasks in both success and error cases.
@@ -758,6 +771,9 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void 
*dcs_void,
     libxl__stream_read_state *stream = &dcs->srs;
     STATE_AO_GC(dcs->ao);
 
+    /* convenience aliases */
+    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
+
     if (rc)
         goto err;
 
@@ -777,11 +793,20 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void 
*dcs_void,
      * If the stream is not still alive, we must not continue any work.
      */
     if (libxl__stream_read_inuse(stream)) {
-        /*
-         * Libxc has indicated that it is done with the stream.  Resume reading
-         * libxl records from it.
-         */
-        stream_continue(egc, stream);
+        if (checkpointed_stream) {
+            /*
+             * Failover from primary. Domain state is currently at a
+             * consistent checkpoint, complete the stream, and call
+             * stream->completion_callback() to resume the guest.
+             */
+            stream_complete(egc, stream, 0);
+        } else {
+            /*
+             * Libxc has indicated that it is done with the stream.
+             * Resume reading libxl records from it.
+             */
+            stream_continue(egc, stream);
+        }
     }
 }
 
-- 
2.5.0




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.