[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 03/10] libxl: convert libxl_domain_destroy to an async op



Roger Pau Monne writes ("[PATCH v4 03/10] libxl: convert libxl_domain_destroy 
to an async op"):
> This change introduces some new structures, and breaks the mutual
> dependency that libxl_domain_destroy and libxl__destroy_device_model
> had. This is done by checking if the domid passed to
> libxl_domain_destroy has a stubdom, and then having the bulk of the
> destroy machinery in a separate function (libxl__destroy_domid) that
> doesn't check for stubdom presence, since we check for it in the upper
> level function. The reason behind this change is the need to use
> structures for ao operations, and it was impossible to have two
> different self-referencing structs.

Thanks.  I have a series of largely cosmetic comments which it would
be nice if you felt like addressing, but:

Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>


> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index bd71844..9003583 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
...
> +/* Check if there are devices that have pending events in the array
> + * pointed to by the "list" parameter. Return values can be:
> + * < 0: All done, but error(s) found.
> + * 0: All done
> + * > 0: Not all done
> + * The passed libxl__ao_device struct in "device" is marked as finished.
> + */

Good.

> +/*
> + * Entry point for domain destruction
> + * This function checks for stubdom presence and then calls
> + * libxl__destroy_domid on the passed domain and it's stubdom if found.

"its" not "it's"

> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 7fdecf1..0b3eb48 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c

> +void libxl__domain_destroy(libxl__egc *egc, libxl__domain_destroy_state *dds)
> +{
> +    STATE_AO_GC(dds->ao);
> +    uint32_t stubdomid = libxl_get_stubdom_id(CTX, dds->domid);
> +
> +    if (stubdomid) {
> +        dds->stubdom.ao = ao;
> +        dds->stubdom.domid = stubdomid;
> +        dds->stubdom.callback = stubdom_callback;
> +        dds->stubdom.force = 1;

This variable seems never to be set to anything but 1.  We aren't
going to have libxl_domain_remove or indeed libxl__remove_domid, are
we ?

Perhaps the domain destroy functions should lose the force flag and
instead simply always pass 1 into the device remove functions.

> +static void stubdom_callback(libxl__egc *egc, libxl__destroy_domid_state 
> *dis,
> +                             int rc)
> +{

Can this function please have `destroy' in its name somewhere ?
libxl.c contains code for doing other things than destruction.  The
same goes for `domain_callback'.

> +    STATE_AO_GC(dis->ao);
> +    libxl__domain_destroy_state *dds = CONTAINER_OF(dis, *dds, stubdom);
> +    const char *savefile;
> +
> +    if (rc) {
> +        LOG(ERROR, "unable to destroy stubdom with domid %u", dis->domid);
> +        dds->rc = rc;
> +    }
> +
> +    dds->stubdom_finished = 1;
> +    savefile = libxl__device_model_savefile(gc, dis->domid);
> +    rc = unlink(savefile);

Please use `rc' only for libxl error codes.  This is a system call
return value.

> +    /*
> +     * On suspend libxl__domain_save_device_model will have already
> +     * unlinked the save file.
> +     */
> +    if (rc && errno == ENOENT) rc = 0;
> +    if (rc) {
> +        LOGEV(ERROR, errno, "failed to remove device-model savefile %s",
> +                            savefile);
> +    }

This is one possible reason why the savefile might not exist.  To be
honest I'm not really convinced this warrants an explanation.  Domain
destruction inherently wants to be idempotent.

And, why not use libxl__remove_file which already contains the
relevant logic including a log message. ?

> diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> index 48f05f9..413b98f 100644
> --- a/tools/libxl/libxl_device.c
> +++ b/tools/libxl/libxl_device.c
> @@ -58,6 +58,48 @@ int libxl__parse_backend_path(libxl__gc *gc,
...
> +    unsigned int num_kinds, num_devs;
> +    char **kinds = NULL, **devs = NULL;
> +    int i, j, rc = 0;
> +    libxl__device dev;
> +    libxl__device_kind kind;
> +    int numdevs = 0;

Having two variables `num_devs' and `numdevs' seems unnecessarily
opaque :-).  Perhaps the one from libxl__xs_directory should be called
`num_dev_xsentries' or something.

> @@ -367,6 +409,23 @@ void libxl__prepare_ao_device(libxl__ao_device *aodev, 
> libxl__ao *ao,
...
> +int libxl__ao_device_check_last(libxl__gc *gc, libxl__ao_device *device,
> +                                libxl__ao_device *list, int num)
> +{
> +    int i, pending = 0, error = 0;
> +
> +    device->active = 0;
> +    for (i = 0; i < num; i++) {
> +        if (list[i].active)
> +            pending++;

Why not `return +1' ?  That would do away with the variable `pending'.

> +
> +        if (list[i].rc)
> +            error = list[i].rc;
> +    }
> +
> +    return pending == 0 ? error : 1;

And it would simplify this to `return error;'.

> -int libxl__devices_destroy(libxl__gc *gc, uint32_t domid)
> +/* Callback for device destruction */
> +
> +static void device_remove_callback(libxl__egc *egc, libxl__ao_device *aodev);
> +
> +void libxl__devices_destroy(libxl__egc *egc, libxl__devices_remove_state 
> *drs)
...
> +    if (drs->num_devices < 0) {
> +        LOG(ERROR, "unable to get number of devices for domain %u", 
> drs->domid);
> +        rc = ERROR_FAIL;
> +        goto out;

A minor point but this should be `rc = drs->num_devices' surely ?  A
minor point since it can currently only fail with ERROR_FAIL but at
some point we might want to invent more error codes and then they
should propagate nicely.

> @@ -426,17 +510,19 @@ int libxl__devices_destroy(libxl__gc *gc, uint32_t 
> domid)
>      /* console 0 frontend directory is not under 
> /local/domain/<domid>/device */
>      path = libxl__sprintf(gc, "/local/domain/%d/console/backend", domid);
>      path = libxl__xs_read(gc, XBT_NULL, path);
> +    GCNEW(dev);
>      if (path && strcmp(path, "") &&
> -        libxl__parse_backend_path(gc, path, &dev) == 0) {
> -        dev.domid = domid;
> -        dev.kind = LIBXL__DEVICE_KIND_CONSOLE;
> -        dev.devid = 0;
> +        libxl__parse_backend_path(gc, path, dev) == 0) {
> +        dev->domid = domid;
> +        dev->kind = LIBXL__DEVICE_KIND_CONSOLE;
> +        dev->devid = 0;
> 
> -        libxl__device_destroy(gc, &dev);
> +        libxl__device_destroy(gc, dev);

This is quite confusing, because libxl__device_destroy sounds like it
should be a synchronous version of libxl__initiate_device_destroy or
to put it another way an internal version of something called
libxl_device_destroy (which doesn't exist); of course such a function
couldn't exist either.

Perhaps libxl__device_destroy should be renamed somehow (perhaps in
another pre-patch).

Or perhaps there should be a comment to note that (currently) consoles
can be simply synchronously destroyed in xenstore.

> @@ -549,6 +635,39 @@ static void device_backend_cleanup(libxl__gc *gc, 
> libxl__ao_device *aodev)
...
> +        do {
> +            t = xs_transaction_start(CTX->xsh);
> +            libxl__xs_path_cleanup(gc, t, fe_path);
> +            libxl__xs_path_cleanup(gc, t, be_path);
> +            rc = !xs_transaction_end(CTX->xsh, t, 0);
> +        } while (rc && errno == EAGAIN);

Again, can we reserve `rc' for libxl error codes ?

> @@ -1090,48 +1114,25 @@ int libxl__destroy_device_model(libxl__gc *gc, 
> uint32_t domid)
...
> +    if (!pid || !atoi(pid)) {
> +        LOG(ERROR, "could not find device-model's pid for dom %u", domid);
> +        ret = ERROR_FAIL;
> +        goto out;
> +    }
> +    ret = kill(atoi(pid), SIGHUP);
> +    if (ret < 0 && errno == ESRCH) {
> +        LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Device Model already exited");
> +        ret = 0;

Of course if this happens we have risked sending SIGHUP to an innocent
process.  It would be nice to arrange for that not to happen but I
don't think we currently have a way to prevent it.

> +        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "failed to kill Device Model 
> [%d]",

Would you care to use LOG(ERROR, ...) ?  You might want to use LOG for
the other cases too.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.