[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [RFC] save image file format CHANGE (minor, but feedback appreciated)



(This is a continuation of
http://lists.xensource.com/archives/html/xen-devel/2009-06/msg00795.html )

I'm working on save/restore migrate for tmem.  Due to the
way that tmem works, tmem will sometimes have to save/migrate
large amounts of (unmapped) data... perhaps gigabytes.  As a
result, in the case of live migration, "save"ing tmem
data cannot wait until the domain has been suspended.

It appears that the "next negative number as marker"
mechanism only works for data that trails the last
iteration of (mapped) pages, and thus works only after
the domain has been suspended.  (True?)

I thought about rewriting the format, or waiting for
someone else to rewrite it, but save/restore/migrate
is the last major functionality missing from tmem,
so I decided to deal with the current format as
best as possible.

As a result, I have extended the format somewhat to
allow for a "negative number as marker" to PREcede
the pages of data.  Since the first data item in
the save file (or migration data stream) is an
"unsigned long" representing the number of pages
(in the p2m table), a small negative number represents
nearly 4G pages, or 16TB of data.  So my change essentially
reduces the number of pages to a handful less than
16TB worth of data.  This is true for both the 32-bit
tools and 64-bit tools.  Hopefully, the fragile
save/restore/migrate system will be completely rewritten
before Xen needs to support more than 16TB per
domain.  Other than this limit, I think the extension
is backwards compatible. It's ugly... but really not
much worse than the existing format.

Patch fragments below... feedback appreciated. It's made
uglier by the fact that it needs to handle both ILP32
and I32/LP64.  (Ignore the DPRINTK's.)  Basically,
grab the "first int"... if it matches the marker,
do tmem stuff.  If not, if I32/LP64, grab the second
part and reconstruct the unsigned long.  Else the
assign the "first int" to the unsigned long.

Thanks,
Dan

diff -r 5333e6497af6 tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c   Mon Jul 20 15:45:50 2009 +0100
+++ b/tools/libxc/xc_domain_restore.c   Thu Jul 30 15:25:38 2009 -0600
@@ -367,15 +367,52 @@ int xc_domain_restore(int xc_handle, int
     /* Buffer for holding HVM context */
     uint8_t *hvm_buf = NULL;
 
+    int first_int = 0;
+
     /* For info only */
     nr_pfns = 0;
 
-    if ( read_exact(io_fd, &p2m_size, sizeof(unsigned long)) )
+
+    if ( read_exact(io_fd, &first_int, sizeof(int)) )
     {
         ERROR("read: p2m_size");
         goto out;
     }
-    DPRINTF("xc_domain_restore start: p2m_size = %lx\n", p2m_size);
+    if ( first_int == -5 )
+    {
+        DPRINTF("xc_domain_restore start tmem\n");
+DPRINTF("xc_tmem_restore called: xc=%d, dom=%d, io_fd=%d\n", 
xc_handle,dom,io_fd);
+        if ( xc_tmem_restore(xc_handle, dom, io_fd) )
+        {
+DPRINTF("xc_tmem_restore failed\n");
+            ERROR("error reading/restoring tmem");
+            goto out;
+        }
+DPRINTF("xc_tmem_restore succeeded\n");
+        if ( read_exact(io_fd, &p2m_size, sizeof(long)) )
+        {
+            ERROR("read: p2m_size");
+            goto out;
+        }
+    }
+    else
+#ifdef __X86_64__
+    {
+        int next_int = 0;
+
+        if ( read_exact(io_fd, &next_int, sizeof(int)) )
+        {
+            ERROR("read: p2m_size");
+            goto out;
+        }
+        p2m_size = (next_int << (sizeof(int) * 8)) | first_int;
+    }
+#else
+        p2m_size = first_int;
+#endif
+
+    DPRINTF("xc_domain_restore start memory: p2m_size = %lx\n", p2m_size);
+
 
     if ( !get_platform_info(xc_handle, dom,
                             &max_mfn, &hvirt_start, &pt_levels, &guest_width) )
@@ -533,6 +570,16 @@ int xc_domain_restore(int xc_handle, int
             }
 
             xc_set_hvm_param(xc_handle, dom, HVM_PARAM_VM86_TSS, vm86_tss);
+            continue;
+        }
+
+        if ( j == -6 )
+        {
+            if ( xc_tmem_restore_extra(xc_handle, dom, io_fd) )
+            {
+                ERROR("error reading/restoring tmem extra");
+                goto out;
+            }
             continue;
         }
 
diff -r 5333e6497af6 tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c      Mon Jul 20 15:45:50 2009 +0100
+++ b/tools/libxc/xc_domain_save.c      Thu Jul 30 15:25:38 2009 -0600
@@ -758,6 +758,7 @@ int xc_domain_save(int xc_handle, int io
     int live  = (flags & XCFLAGS_LIVE);
     int debug = (flags & XCFLAGS_DEBUG);
     int race = 0, sent_last_iter, skip_this_iter;
+    int tmem_saved = 0;
 
     /* The new domain's shared-info frame number. */
     unsigned long shared_info_frame;
@@ -880,6 +881,13 @@ int xc_domain_save(int xc_handle, int io
             ERROR("Domain appears not to have suspended");
             goto out;
         }
+    }
+
+    tmem_saved = xc_tmem_save(xc_handle, dom, io_fd, live, -5);
+    if ( tmem_saved == -1 )
+    {
+        ERROR("Error when writing to state file (tmem)");
+        goto out;
     }
 
     last_iter = !live;
@@ -1600,10 +1608,22 @@ int xc_domain_save(int xc_handle, int io
         goto out;
     }
 
+    if ( tmem_saved > 0 && live )
+    {
+        if ( xc_tmem_save_extra(xc_handle, dom, io_fd, -6) == -1 )
+        {
+            ERROR("Error when writing to state file (tmem)");
+            goto out;
+        }
+    }
+
     /* Success! */
     rc = 0;
 
  out:
+
+    if ( tmem_saved != 0 && live )
+        xc_tmem_save_done(xc_handle, dom);
 
     if ( live )
     {

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.