[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2 for-4.12] libxl: When restricted, start QEMU paused



libxl runs the command "cont" later during guest creation; i.e. it
is expecting that QEMU would not do any emulation.  Use the "-S"
command option to achieve this.

Unfortunately, when QEMU is started with "-S", it won't write QEMU's
readiness into xenstore. So only activate this option when we have a
QEMU startup notification via QMP available, i.e. when dm_restrict
is activated.

The -S option has the side-effect of suppressing the startup
notification via xenstore: libxl will only get the notification via
QMP.

It is important to rely only on QMP for notification when we have
QMP available, as (due to a qemu bug) not waiting for that QMP
notification may result in the QMP socket becoming blocked, so that
QEMU stops responding to new connections even if no existing ones
are active.

When the QEMU bug happens, the actions taken by both libxl and QEMU
are roughly as follows:
- libxl connects and handshakes with QEMU, then sends the
  cmd "query-status".
- QEMU prepares and maybe tries to send the response,
  while also writing "running" into xenstore.
- libxl sees via xenstore that QEMU is running and disconnects from the
  QMP socket before receiving the response from the cmd.
=> The QMP socket (monitor) is thereby blocked and will never reply
  to commands on new connections.

This is due to QEMU only responding to one command at a time, and
suspending its monitor (QMP) until the command has been processed and
sent. Disconnecting from the socket doesn't unsuspend the monitor. The
race described here is very likely to happen with QEMU 3.1.50 (during
3.2 development), but can be reproduced with QEMU 3.1.

Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx>
Release-acked-by: Juergen Gross <jgross@xxxxxxxx>

---
v2:
    commit message reworked.
---
 tools/libxl/libxl_dm.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index b245956b77..2f19786bdd 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1183,6 +1183,14 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
         flexarray_append(dm_args,
             GCSPRINTF("socket,id=libxl-cmd,fd=%d,server,nowait",
                       state->dm_monitor_fd));
+
+        /*
+         * Start QEMU with its "CPU" paused, it will not start any emulation
+         * until the QMP command "cont" is used. This also prevent QEMU from
+         * writing "running" to the "state" xenstore node so we only use this
+         * flag when we have the QMP based startup notification.
+         * */
+        flexarray_append(dm_args, "-S");
     } else {
         flexarray_append(dm_args,
                          GCSPRINTF("socket,id=libxl-cmd,"
@@ -2702,6 +2710,7 @@ static void device_model_qmp_cb(libxl__egc *egc, 
libxl__ev_qmp *ev,
     libxl__dm_spawn_state *dmss = CONTAINER_OF(ev, *dmss, qmp);
     const libxl__json_object *o;
     const char *status;
+    const char *expected_state;
 
     libxl__ev_qmp_dispose(gc, ev);
 
@@ -2717,7 +2726,11 @@ static void device_model_qmp_cb(libxl__egc *egc, 
libxl__ev_qmp *ev,
         goto failed;
     }
     status = libxl__json_object_get_string(o);
-    if (strcmp(status, "running")) {
+    if (!dmss->build_state->saved_state)
+        expected_state = "prelaunch";
+    else
+        expected_state = "paused";
+    if (strcmp(status, expected_state)) {
         LOGD(ERROR, ev->domid, "Unexpected QEMU status: %s", status);
         rc = ERROR_NOT_READY;
         goto failed;
-- 
Anthony PERARD


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.