[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] libxl: fix migration of PV and PVH domUs with and without qemu
On Fri, May 03, 2019 at 11:42:51AM +0200, Olaf Hering wrote: > If a domU has a qemu-xen instance attached, it is required to call qemus > "xen-save-devices-state" method. Without it, the receiving side of a PV or > PVH migration may be unable to lock the image: > > xen be: qdisk-51712: xen be: qdisk-51712: error: Failed to get "write" lock > error: Failed to get "write" lock > xen be: qdisk-51712: xen be: qdisk-51712: initialise() failed > initialise() failed > > To fix this bug, libxl__domain_suspend_device_model() and > libxl__domain_resume_device_model() have to be called not only for HVM, > but also if the active device_model is QEMU_XEN. > > Unfortunately, libxl__domain_build_info_setdefault() hardcodes > b_info->device_model_version to QEMU_XEN if it does not know it any > better. This breaks domUs without a device_model. libxl__qmp_stop() would > wait 10 seconds in qmp_open() for a qemu that will never appear. During > this long timeframe the domU remains in state paused on the sending side. > As a result network connections may be dropped. Once this bug is fixed as > well, by just removing that assumption, there is no code to actually > initialise b_info->device_model_version. > > There is a helper function libxl__need_xenpv_qemu(), which is used in > various places to decide if any device_model has to be spawned. This > function can not be used as is, just to fill b_info->device_model_version, > because store_libxl_entry() was already called earlier. Update this > function to receive a domid to work with, instead of reading xenstore. I have to admit I'm not that familiar with the migration code, and it's fairly complex, so take this with a pinch of salt. Wouldn't it be easier to leave libxl__need_xenpv_qemu alone and just use the contents of the migration stream to decide whether to launch a QEMU for the PV backends or not? ie: just parsing the domain config on the migration stream should be enough for the destination side to decide whether a QEMU is needed in order to handle the PV backends? > Rearrange the code and initialize b_info->device_model_version in > libxl__domain_build_info_setdefault() per DOMAIN_TYPE. > > Update initiate_domain_create() to set b_info->device_model_version if it > was not set earlier, using the updated libxl__need_xenpv_qemu(). > > Introduce LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED for PV and PVH that > have no need for a device_model. > > Update existing users of libxl__need_xenpv_qemu() to use > b_info->device_model_version for their check if a device_model is needed. > > v02: > - update wording in a comment > - remove stale goto in domcreate_launch_dm > - initialize ret in libxl__need_xenpv_qemu > > Signed-off-by: Olaf Hering <olaf@xxxxxxxxx> > Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx> > Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx> > --- > tools/libxl/libxl_create.c | 39 +++++++++++++++++++++++++++++++-------- > tools/libxl/libxl_dm.c | 40 +++++++++++++++++++++++----------------- > tools/libxl/libxl_dom_suspend.c | 8 ++++++-- > tools/libxl/libxl_internal.h | 3 ++- > tools/libxl/libxl_types.idl | 1 + > 5 files changed, 63 insertions(+), 28 deletions(-) > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 89fe80fc9c..150ab02354 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -87,16 +87,20 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, > b_info->device_model_ssidref = SECINITSID_DOMDM; > > if (!b_info->device_model_version) { > - if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) { > + switch (b_info->type) { > + case LIBXL_DOMAIN_TYPE_HVM: > if (libxl_defbool_val(b_info->device_model_stubdomain)) { > b_info->device_model_version = > LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL; > } else { > b_info->device_model_version = > libxl__default_device_model(gc); > } > - } else { > - b_info->device_model_version = > - LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN; > + break; > + case LIBXL_DOMAIN_TYPE_PV: > + case LIBXL_DOMAIN_TYPE_PVH: > + default: > + /* may be set later */ > + break; Is it really worth it to change the if into a switch? AFAICT it would be simpler to just remove the else branch from the existing if. > } > if (b_info->device_model_version > == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { > @@ -978,6 +982,17 @@ static void initiate_domain_create(libxl__egc *egc, > goto error_out; > } > > + if (d_config->b_info.device_model_version > + == LIBXL_DEVICE_MODEL_VERSION_UNKNOWN) { > + ret = libxl__need_xenpv_qemu(gc, d_config, domid); I think the above call is wrong, libxl__need_xenpv_qemu expects to get the domid of the toolstack domain (ie: the domain running this code), not the domain being created. > + if (ret) > + d_config->b_info.device_model_version = > + LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN; > + else > + d_config->b_info.device_model_version = > + LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED; > + } > + > dcs->guest_domid = domid; > dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */ > > @@ -1312,6 +1327,7 @@ static void domcreate_launch_dm(libxl__egc *egc, > libxl__multidev *multidev, > libxl__domain_create_state *dcs = CONTAINER_OF(multidev, *dcs, multidev); > STATE_AO_GC(dcs->ao); > int i; > + bool need_qemu; > > /* convenience aliases */ > const uint32_t domid = dcs->guest_domid; > @@ -1464,10 +1480,17 @@ static void domcreate_launch_dm(libxl__egc *egc, > libxl__multidev *multidev, > libxl__device_console_add(gc, domid, &console, state, &device); > libxl__device_console_dispose(&console); > > - ret = libxl__need_xenpv_qemu(gc, d_config); > - if (ret < 0) > - goto error_out; > - if (ret) { > + switch (d_config->b_info.device_model_version) { > + case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: > + case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: > + need_qemu = true; > + break; > + default: > + need_qemu = false; > + break; AFAICT at this point device_model_version will either be set to one of the versions (either trad or upstream) or to none. So you could initialize need_qemu to false and set if to true if device_model_version != LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED. > + } > + > + if (need_qemu) { > dcs->sdss.dm.guest_domid = domid; > libxl__spawn_local_dm(egc, &dcs->sdss.dm); > return; > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index 2f19786bdd..bab04ab196 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -2268,7 +2268,7 @@ static void spawn_stub_launch_dm(libxl__egc *egc, > libxl__domain_build_state *const d_state = sdss->dm.build_state; > libxl__domain_build_state *const stubdom_state = &sdss->dm_state; > uint32_t dm_domid = sdss->pvqemu.guest_domid; > - int need_qemu; > + bool need_qemu; > > if (ret) { > LOGD(ERROR, guest_domid, "error connecting disk devices"); > @@ -2337,7 +2337,15 @@ static void spawn_stub_launch_dm(libxl__egc *egc, > } > } > > - need_qemu = libxl__need_xenpv_qemu(gc, dm_config); > + switch (dm_config->b_info.device_model_version) { > + case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: > + case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: > + need_qemu = true; > + break; > + default: > + need_qemu = false; > + break; > + } > > for (i = 0; i < num_console; i++) { > libxl__device device; > @@ -3175,18 +3183,11 @@ static void kill_device_model_uid_cb(libxl__egc *egc, > } > > /* Return 0 if no dm needed, 1 if needed and <0 if error. */ > -int libxl__need_xenpv_qemu(libxl__gc *gc, libxl_domain_config *d_config) > +int libxl__need_xenpv_qemu(libxl__gc *gc, libxl_domain_config *d_config, > uint32_t domid) > { > - int idx, i, ret, num; > - uint32_t domid; > + int idx, i, ret = 0, num; > const struct libxl_device_type *dt; > > - ret = libxl__get_domid(gc, &domid); > - if (ret) { > - LOG(ERROR, "unable to get domain id"); > - goto out; > - } The above calls gets the domid of the current domain running this code (the toolstack domain), not the domain being created. > - > if (d_config->num_vfbs > 0 || d_config->num_p9s > 0) { > ret = 1; > goto out; > @@ -3238,21 +3239,26 @@ int libxl__dm_check_start(libxl__gc *gc, > libxl_domain_config *d_config, > uint32_t domid) > { > int rc; > + bool need_qemu; > > if (libxl__dm_active(gc, domid)) > return 0; > > - rc = libxl__need_xenpv_qemu(gc, d_config); > - if (rc < 0) > - goto out; > - > - if (!rc) > + switch (d_config->b_info.device_model_version) { > + case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: > + case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: > + need_qemu = true; > + break; > + default: > + need_qemu = false; > + break; > + } > + if (need_qemu == false) > return 0; > > LOGD(ERROR, domid, "device model required but not running"); > rc = ERROR_FAIL; > > -out: > return rc; > } > > diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c > index d1af3a6573..c492fe5dd1 100644 > --- a/tools/libxl/libxl_dom_suspend.c > +++ b/tools/libxl/libxl_dom_suspend.c > @@ -379,7 +379,9 @@ static void > domain_suspend_common_guest_suspended(libxl__egc *egc, > libxl__ev_xswatch_deregister(gc, &dsps->guest_watch); > libxl__ev_time_deregister(gc, &dsps->guest_timeout); > > - if (dsps->type == LIBXL_DOMAIN_TYPE_HVM) { > + if (dsps->type == LIBXL_DOMAIN_TYPE_HVM || > + libxl__device_model_version_running(gc, dsps->domid) == > + LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { > dsps->callback_device_model_done = domain_suspend_common_done; > libxl__domain_suspend_device_model(egc, dsps); /* must be last */ > return; > @@ -459,7 +461,9 @@ int libxl__domain_resume(libxl__gc *gc, uint32_t domid, > int suspend_cancel) > goto out; > } > > - if (type == LIBXL_DOMAIN_TYPE_HVM) { > + if (type == LIBXL_DOMAIN_TYPE_HVM || > + libxl__device_model_version_running(gc, domid) == > + LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { > rc = libxl__domain_resume_device_model(gc, domid); > if (rc) { > LOGD(ERROR, domid, "failed to resume device model:%d", rc); > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index 44e0221284..9eb4211d85 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -1817,7 +1817,8 @@ _hidden int libxl__domain_build(libxl__gc *gc, > _hidden const char *libxl__domain_device_model(libxl__gc *gc, > const libxl_domain_build_info *info); > _hidden int libxl__need_xenpv_qemu(libxl__gc *gc, > - libxl_domain_config *d_config); > + libxl_domain_config *d_config, > + uint32_t domid); > _hidden bool libxl__query_qemu_backend(libxl__gc *gc, > uint32_t domid, > uint32_t backend_id, > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index cb4702fd7a..7d75bd3850 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -106,6 +106,7 @@ libxl_device_model_version = > Enumeration("device_model_version", [ > (0, "UNKNOWN"), > (1, "QEMU_XEN_TRADITIONAL"), # Historical qemu-xen device model (qemu-dm) > (2, "QEMU_XEN"), # Upstream based qemu-xen device model > + (3, "NONE_REQUIRED"), > ]) > > libxl_console_type = Enumeration("console_type", [ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |