[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]

On Mon, Nov 4, 2013 at 10:40 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):
> On Mon, Nov 4, 2013 at 6:12 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote>     Perhaps it would be helpful if I provided a pre-patch to make that
>     change for you.
> Yes, that would be pretty helpful. Thanks!

See below.  I have compiled this but not tested it.  It should be safe
but I can't rule out having perpetrated some howler of a bug.  Please
let me know if it doesn't work.

Ian Jackson writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):
> Shriram writes:
> > Fair enough. My question is what is the overhead of setting up, firing
> > and tearing down a timeout event using the event gen framework, if I
> > wish to checkpoint the VM, say every 20ms ?
> The ultimate cost of going back into the event loop to wait for a
> timeout will depend on what else the process is doing.  If the process
> is doing nothing else, it's about two calls to gettimeofday and one to
> poll.  Plus a bit of in-process computation, but that's going to be
> swamped by system call overhead.
> Having said that, libxl is not performance-optimised.  Indeed the
> callback mechanism involves context switching, and IPC, between the
> save/restore helper and libxl proper.  Probably not too much to be
> doing every 20ms for a single domain, but if you have a lot of these
> it's going to end up taking a lot of dom0 cpu etc.
> I assume you're not doing this for HVM domains, which involve saving
> the qemu state each time too.

I guess another way to look at this is that changing this one timeout
from a synchronous to asynchronous version is not going to make any
noticeable difference to the performance of the whole thing.  You're
already using all of the asynchronous save/restore helper machinery
and the libxl event loop.

So if the performance of your V3 patches is acceptable, this will be
fine too.


>From 46b08918302a8c1d2e470b7af7045557f73afde9 Mon Sep 17 00:00:00 2001
From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Date: Mon, 4 Nov 2013 16:27:53 +0000
Subject: [PATCH] libxl: make libxl__domain_suspend_callback be asynchronous

Mark the suspend callback as asynchronous in the helper stub generator
(libxl_save_msgs_gen.pl).  Remus is going to want to provide an
asynchronous version of this function.

libxl__domain_suspend_common_callback, the common synchronous core,
which used to be provided directly as the callback function for the
helper machinery, becomes libxl__domain_suspend_callback_common.  It
can now take a typesafe parameter.

For now, provide two very similar asynchronous wrappers for it.  Each
is a simple function which contains only boilerplate, calls the common
synchronous core, and returns the asynchronous response.

Essentially, we have just moved (in the case of suspend callbacks) the
call site of libxl__srm_callout_callback_complete.  It was in the
switch statement in the autogenerated _libxl_save_msgs_callout.c, and
is now in the handwritten libxl_dom.c.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Cc: Shriram Rajagopalan <rshriram@xxxxxxxxx>
Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>
 tools/libxl/libxl_dom.c            |   25 +++++++++++++++++++------
 tools/libxl/libxl_internal.h       |    2 +-
 tools/libxl/libxl_save_msgs_gen.pl |    2 +-
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 1812bdc..b5cde42 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1003,10 +1003,8 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
     return 0;

-int libxl__domain_suspend_common_callback(void *user)
+int libxl__domain_suspend_callback_common(libxl__domain_suspend_state *dss)
-    libxl__save_helper_state *shs = user;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     unsigned long hvm_s_state = 0, hvm_pvdrv = 0;
     int ret;
@@ -1225,12 +1223,27 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
     return 0;

+static void libxl__domain_suspend_callback(void *data)
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    int ok = libxl__domain_suspend_callback_common(dss);
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
 /*----- remus callbacks -----*/

-static int libxl__remus_domain_suspend_callback(void *data)
+static void libxl__remus_domain_suspend_callback(void *data)
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     /* REMUS TODO: Issue disk and network checkpoint reqs. */
-    return libxl__domain_suspend_common_callback(data);
+    int ok = libxl__domain_suspend_callback_common(dss);
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);

 static int libxl__remus_domain_resume_callback(void *data)
@@ -1354,7 +1367,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
     } else
-        callbacks->suspend = libxl__domain_suspend_common_callback;
+        callbacks->suspend = libxl__domain_suspend_callback;

     callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
     dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4f92522..79eb8f8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2551,7 +2551,7 @@ _hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
 void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
                            libxl__save_helper_state *shs, int return_value);

-_hidden int libxl__domain_suspend_common_callback(void *data);
+_hidden int libxl__domain_suspend_callback_common(libxl__domain_suspend_state*);
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
 _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index ee126c7..3c6bd57 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -23,7 +23,7 @@ our @msgs = (
                                                  STRING doing_what),
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
-    [  3, 'scxW',   "suspend", [] ],
+    [  3, 'scxA',   "suspend", [] ],
     [  4, 'scxW',   "postcopy", [] ],
     [  5, 'scxA',   "checkpoint", [] ],
     [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid

The patch seems harmless enough. How do you want to go about this?
Do you want to post/commit this patch ? Because I have to modify my patches
accordingly. Or should I post this along with my patch series, avoiding the need
to wait on you before I post mine ?

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.