[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] xl: Always use "fast" migration resume protocol



commit c04c825bdf1e946260cba325eeed993004051050
Author:     Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
AuthorDate: Mon Jan 13 18:15:37 2014 +0000
Commit:     Ian Campbell <ian.campbell@xxxxxxxxxx>
CommitDate: Wed Jan 15 13:33:12 2014 +0000

    xl: Always use "fast" migration resume protocol
    
    As Ian Campbell writes in http://bugs.xenproject.org/xen/bug/30:
    
      There are two mechanisms by which a suspend can be aborted and the
      original domain resumed.
    
      The older method is that the toolstack resets a bunch of state (see
      tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then
      restarts the domain. The domain will see HYPERVISOR_suspend return 0
      and will continue without any realisation that it is actually
      running in the original domain and not in a new one. This method is
      supposed to be implemented by libxl_domain_resume(suspend_cancel=0)
      but it is not.
    
      The other method is newer and in this case the toolstack arranges
      that HYPERVISOR_suspend returns SUSPEND_CANCEL and restarts it. The
      domain will observe this and realise that it has been restarted in
      the same domain and will behave accordingly. This method is
      implemented, correctly AFAIK, by
      libxl_domain_resume(suspend_cancel=1).
    
    Attempting to use the old method without doing all of the work simply
    causes the guest to crash.  Implementing the work required for old
    method, or for checking that domains actually support the new method,
    is not feasible at this stage of the 4.4 release.
    
    So, always use the new method, without regard to the declarations of
    support by the guest.  This is a strict improvement: guests which do
    in fact support the new method will work, whereas ones which don't are
    no worse off.
    
    There are two call sites of libxl_domain_resume that need fixing, both
    in the migration error path.
    
    With this change I observe a correct and successful resumption of a
    Debian wheezy guest with a Linux 3.4.70 kernel after a migration
    attempt which I arranged to fail by nobbling the block hotplug script.
    
    Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
    Acked-by: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
    CC: konrad.wilk@xxxxxxxxxx
    CC: David Vrabel <david.vrabel@xxxxxxxxxx>
    CC: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
---
 tools/libxl/xl_cmdimpl.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c30f495..d93e01b 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3734,7 +3734,7 @@ static void migrate_domain(uint32_t domid, const char 
*rune, int debug,
         if (common_domname) {
             libxl_domain_rename(ctx, domid, away_domname, common_domname);
         }
-        rc = libxl_domain_resume(ctx, domid, 0, 0);
+        rc = libxl_domain_resume(ctx, domid, 1, 0);
         if (!rc) fprintf(stderr, "migration sender: Resumed OK.\n");
 
         fprintf(stderr, "Migration failed due to problems at target.\n");
@@ -3756,7 +3756,7 @@ static void migrate_domain(uint32_t domid, const char 
*rune, int debug,
     close(send_fd);
     migration_child_report(recv_fd);
     fprintf(stderr, "Migration failed, resuming at sender.\n");
-    libxl_domain_resume(ctx, domid, 0, 0);
+    libxl_domain_resume(ctx, domid, 1, 0);
     exit(-ERROR_FAIL);
 
  failed_badly:
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.