[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RFC] save image file format CHANGE (minor, but feedback appreciated)
(This is a continuation of http://lists.xensource.com/archives/html/xen-devel/2009-06/msg00795.html ) I'm working on save/restore migrate for tmem. Due to the way that tmem works, tmem will sometimes have to save/migrate large amounts of (unmapped) data... perhaps gigabytes. As a result, in the case of live migration, "save"ing tmem data cannot wait until the domain has been suspended. It appears that the "next negative number as marker" mechanism only works for data that trails the last iteration of (mapped) pages, and thus works only after the domain has been suspended. (True?) I thought about rewriting the format, or waiting for someone else to rewrite it, but save/restore/migrate is the last major functionality missing from tmem, so I decided to deal with the current format as best as possible. As a result, I have extended the format somewhat to allow for a "negative number as marker" to PREcede the pages of data. Since the first data item in the save file (or migration data stream) is an "unsigned long" representing the number of pages (in the p2m table), a small negative number represents nearly 4G pages, or 16TB of data. So my change essentially reduces the number of pages to a handful less than 16TB worth of data. This is true for both the 32-bit tools and 64-bit tools. Hopefully, the fragile save/restore/migrate system will be completely rewritten before Xen needs to support more than 16TB per domain. Other than this limit, I think the extension is backwards compatible. It's ugly... but really not much worse than the existing format. Patch fragments below... feedback appreciated. It's made uglier by the fact that it needs to handle both ILP32 and I32/LP64. (Ignore the DPRINTK's.) Basically, grab the "first int"... if it matches the marker, do tmem stuff. If not, if I32/LP64, grab the second part and reconstruct the unsigned long. Else the assign the "first int" to the unsigned long. Thanks, Dan diff -r 5333e6497af6 tools/libxc/xc_domain_restore.c --- a/tools/libxc/xc_domain_restore.c Mon Jul 20 15:45:50 2009 +0100 +++ b/tools/libxc/xc_domain_restore.c Thu Jul 30 15:25:38 2009 -0600 @@ -367,15 +367,52 @@ int xc_domain_restore(int xc_handle, int /* Buffer for holding HVM context */ uint8_t *hvm_buf = NULL; + int first_int = 0; + /* For info only */ nr_pfns = 0; - if ( read_exact(io_fd, &p2m_size, sizeof(unsigned long)) ) + + if ( read_exact(io_fd, &first_int, sizeof(int)) ) { ERROR("read: p2m_size"); goto out; } - DPRINTF("xc_domain_restore start: p2m_size = %lx\n", p2m_size); + if ( first_int == -5 ) + { + DPRINTF("xc_domain_restore start tmem\n"); +DPRINTF("xc_tmem_restore called: xc=%d, dom=%d, io_fd=%d\n", xc_handle,dom,io_fd); + if ( xc_tmem_restore(xc_handle, dom, io_fd) ) + { +DPRINTF("xc_tmem_restore failed\n"); + ERROR("error reading/restoring tmem"); + goto out; + } +DPRINTF("xc_tmem_restore succeeded\n"); + if ( read_exact(io_fd, &p2m_size, sizeof(long)) ) + { + ERROR("read: p2m_size"); + goto out; + } + } + else +#ifdef __X86_64__ + { + int next_int = 0; + + if ( read_exact(io_fd, &next_int, sizeof(int)) ) + { + ERROR("read: p2m_size"); + goto out; + } + p2m_size = (next_int << (sizeof(int) * 8)) | first_int; + } +#else + p2m_size = first_int; +#endif + + DPRINTF("xc_domain_restore start memory: p2m_size = %lx\n", p2m_size); + if ( !get_platform_info(xc_handle, dom, &max_mfn, &hvirt_start, &pt_levels, &guest_width) ) @@ -533,6 +570,16 @@ int xc_domain_restore(int xc_handle, int } xc_set_hvm_param(xc_handle, dom, HVM_PARAM_VM86_TSS, vm86_tss); + continue; + } + + if ( j == -6 ) + { + if ( xc_tmem_restore_extra(xc_handle, dom, io_fd) ) + { + ERROR("error reading/restoring tmem extra"); + goto out; + } continue; } diff -r 5333e6497af6 tools/libxc/xc_domain_save.c --- a/tools/libxc/xc_domain_save.c Mon Jul 20 15:45:50 2009 +0100 +++ b/tools/libxc/xc_domain_save.c Thu Jul 30 15:25:38 2009 -0600 @@ -758,6 +758,7 @@ int xc_domain_save(int xc_handle, int io int live = (flags & XCFLAGS_LIVE); int debug = (flags & XCFLAGS_DEBUG); int race = 0, sent_last_iter, skip_this_iter; + int tmem_saved = 0; /* The new domain's shared-info frame number. */ unsigned long shared_info_frame; @@ -880,6 +881,13 @@ int xc_domain_save(int xc_handle, int io ERROR("Domain appears not to have suspended"); goto out; } + } + + tmem_saved = xc_tmem_save(xc_handle, dom, io_fd, live, -5); + if ( tmem_saved == -1 ) + { + ERROR("Error when writing to state file (tmem)"); + goto out; } last_iter = !live; @@ -1600,10 +1608,22 @@ int xc_domain_save(int xc_handle, int io goto out; } + if ( tmem_saved > 0 && live ) + { + if ( xc_tmem_save_extra(xc_handle, dom, io_fd, -6) == -1 ) + { + ERROR("Error when writing to state file (tmem)"); + goto out; + } + } + /* Success! */ rc = 0; out: + + if ( tmem_saved != 0 && live ) + xc_tmem_save_done(xc_handle, dom); if ( live ) { _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |