[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH for-4.12] libxl: When restricted, start QEMU paused
Since libxl later during guest creation run the command "cont", it kind of expect that QEMU would not do any emulation, use the "-S" command option to make this effective. Unfortunately, when QEMU is started with "-S", it won't write QEMU's readiness into xenstore. So only activate this options when we have a QEMU startup notification via QMP available, which is when dm_restrict is activated. This have the side-effect of rendering ineffective the startup notification via xenstore, libxl will only have the notification via QMP. It became important to rely only on QMP for notification when we have it, as cutting short that path may result in the QMP socket been blocked and have QEMU stop responding to upcoming connection even if none are active. The QEMU bug that this patch works around is: - libxl connect and hand-check with QEMU, then send the cmd "query-status". - QEMU prepare and maybe try send the response, while also writing "running" into xenstore. - libxl see via xenstore that QEMU is running and disconnect from the QMP socket before receiving the response the cmd. => The QMP socket (monitor) is sometime blocked and will never reply to commands on new connections. This is due to QEMU only responding to one command at a time, and suspending its monitor (QMP) until the command as been processed and sent. Disconnecting from the socket doesn't unsuspend the monitor. The race described here is very likely to happen with QEMU 3.1.50 (during 3.2 development), but can be reproduced with QEMU 3.1. Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx> --- Here is an example of what's happening in osstest with qemu mainline: http://logs.test-lab.xenproject.org/osstest/logs/132403/test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict/info.html http://logs.test-lab.xenproject.org/osstest/logs/132403/test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict/10.ts-debian-hvm-install.log (there are a few "qmp_next: Domain 1:timeout", beside what's look like debian not finishing to install and reboot) --- tools/libxl/libxl_dm.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index b245956b77..2f19786bdd 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1183,6 +1183,14 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, GCSPRINTF("socket,id=libxl-cmd,fd=%d,server,nowait", state->dm_monitor_fd)); + + /* + * Start QEMU with its "CPU" paused, it will not start any emulation + * until the QMP command "cont" is used. This also prevent QEMU from + * writing "running" to the "state" xenstore node so we only use this + * flag when we have the QMP based startup notification. + * */ + flexarray_append(dm_args, "-S"); } else { flexarray_append(dm_args, GCSPRINTF("socket,id=libxl-cmd," @@ -2702,6 +2710,7 @@ static void device_model_qmp_cb(libxl__egc *egc, libxl__ev_qmp *ev, libxl__dm_spawn_state *dmss = CONTAINER_OF(ev, *dmss, qmp); const libxl__json_object *o; const char *status; + const char *expected_state; libxl__ev_qmp_dispose(gc, ev); @@ -2717,7 +2726,11 @@ static void device_model_qmp_cb(libxl__egc *egc, libxl__ev_qmp *ev, goto failed; } status = libxl__json_object_get_string(o); - if (strcmp(status, "running")) { + if (!dmss->build_state->saved_state) + expected_state = "prelaunch"; + else + expected_state = "paused"; + if (strcmp(status, expected_state)) { LOGD(ERROR, ev->domid, "Unexpected QEMU status: %s", status); rc = ERROR_NOT_READY; goto failed; -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |