[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 08/10] libxl: Kill QEMU by uid when possible



George Dunlap writes ("[PATCH v2 08/10] libxl: Kill QEMU by uid when possible"):
> The privcmd fd that a dm_restrict'ed QEMU has gives it permission to
> one specific domain ID.  This domain ID will probably eventually be
> used again.  It is therefore necessary to make absolutely sure that a
> rogue QEMU process cannot hang around after its domain has exited.

Thanks.  Detailed comments mostly on error handling follow...

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f9e0bf6578..cd3208f4b8 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1135,7 +1135,7 @@ typedef struct {
>      const char *shim_cmdline;
>      const char *pv_cmdline;
>  
> -    char *dm_runas;
> +    char *dm_runas, *dm_uid;

I think `dm_uid' is misnamed.  It is only set if the dm has a
*dedicated* uid.  If the dm is run as uid 0 or as a shared qemu uid,
it is set to NULL.

> @@ -3706,6 +3706,8 @@ struct libxl__destroy_devicemodel_state {
>      uint32_t domid;
>      libxl__devicemodel_destroy_cb *callback; /* May be called re-entrantly */
>      /* private to implementation */
> +    libxl__ev_child destroyer;
> +    int rc; /* Accumulated return value for the destroy operation */

Excellent.

> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 7f9c6a62fe..53fdf8daf7 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -129,6 +129,8 @@ static int libxl__domain_get_device_model_uid(libxl__gc 
> *gc,
>      int rc;
>      char *user;
>      uid_t intended_uid;
> +    bool kill_by_uid;
> +

I guess this new blank line is deliberate.  I think it is probably a
good idea.

> @@ -148,8 +150,10 @@ static int libxl__domain_get_device_model_uid(libxl__gc 
> *gc,
>              LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s",
>                   user);
>              rc = ERROR_INVAL;
> -        } else
> +        } else {
>              intended_uid = user_base->pw_uid;
> +            kill_by_uid = true;
> +        }
>  
>          goto out;
>      }

I think your changes to the out block newly imply that all `goto out'
with `rc=0' must also set kill_by_uid.

I have not (in this revision) applied this patch and searched for that
because I think there will have to be a respin to consistently apply
the rules I previously identified.

(I mention this here partly so that when I review v3 I know that my
former self wanted to double check that.)

> @@ -226,6 +234,8 @@ out:
>          }
>  
>          state->dm_runas = user;
> +        if (kill_by_uid)
> +            state->dm_uid = GCSPRINTF("%ld", (long)intended_uid);
>      }

More stuff is accumulating in this post-0-check success path.
Perhaps this means my suggestion of
   if (!rc) { ... }  if (!rc) { ... }
will be most convenient.

> +    /*
> +     * If we're starting the dm with a non-root UID, save the UID so
> +     * that we can reliably kill it and any subprocesses
> +     */
> +    if (state->dm_uid)
> +        libxl__xs_printf(gc, XBT_NULL,
> +                         GCSPRINTF("%s/image/device-model-uid", dom_path),

My comment about the misnamed libxl variable applies to
device-model-uid too I think.

> +#define PROPAGATE_RC if(!ddms->rc) ddms->rc = rc
                          ^

I like this macro.  But (i) it needs proper macro hygiene - either a
do{ }while(0) block, or refactoring into an expression (and then
putting in parens).  And (ii) you missed out a space.

(The references to ddms and rc are anaphoric, not macro parameters, so
do not need parens.)

> +    path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, "");
> +    rc = libxl__xs_rm_checked(gc, XBT_NULL, path);
> +    if (rc) {
> +        PROPAGATE_RC;
>          LOGD(ERROR, domid, "xs_rm failed for %s", path);
> +    }

Your new macro makes this a nicely transparent pasttern.

> +    /*
> +     * See if we should try to kill by uid
> +     */
> +    path = GCSPRINTF("/local/domain/%d/image/device-model-uid", domid);
> +    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &dm_uid_str);
> +
> +    /*
> +     * If there was an error here, accumulate the error and fall back
> +     * to killing by pid.
> +     */
> +    if (rc) {
> +        PROPAGATE_RC;
> +        LOGD(ERROR, domid, "Reading dm UID path failed for %s", path);
> +    }

From the comment for libxl__xs_read_checked:
 | * On error, *result_out is undefined.
Arguably this is a bear trap.  Maybe you would like to fix it there
rather than by setting dm_uid_str to 0 here.

> +        reaper_pid = libxl__ev_child_fork(gc, &ddms->destroyer,
> +                                          kill_device_model_uid_cb);
> +        if (reaper_pid < 0) {
> +            rc = ERROR_FAIL;
> +            PROPAGATE_RC;
> +            /*
> +             * Note that if this fails, we still don't kill by pid, to
> +             * make sure that an untrusted DM has not "maliciously"
> +             * exited (potentially causing us to kill an unrelated
> +             * process which happened to get the same pid).
> +             */
> +            goto out;
> +        }
> +
> +        if (!reaper_pid) {  /* child */
> +            rc = kill_device_model_uid_child(ddms, dm_uid_str);
> +            _exit(rc);
> +        }

You cannot _exit(rc).  See my comments below...

Personally I like to put the child process block right after the fork,
especially when (like here) it is very short because the meat has been
lifted elsewhere.  Just a suggestion; you can leave it here if you
prefer.

> -    /* We should try to destroy the device model anyway. */
> -    rc = kill_device_model(gc,
> -              GCSPRINTF("/local/domain/%d/image/device-model-pid", domid));
> +    /*
> +     * No uid to kill; attept to kill by pid.
> +     */
> +    LOGD(DEBUG, domid, "Didn't find dm UID; destroying by pid");
> +
> +    path = GCSPRINTF("/local/domain/%d/image/device-model-pid", domid);
> +    rc = kill_device_model(gc, path);
> +
> +    if (rc) {
> +        PROPAGATE_RC;
> +        LOGD(ERROR, domid, "Killing device model pid from path %s", path);
> +    }
> +
> +out:

I would prefer the apparently-redundant:

  +    rc = 0;
> +out:

This avoids introducing a bug if, just before the success exit path,
new code gets added which doesn't happen to leave rc==0.

> +    /*
> +     * NB that we always return '0' here for the "status of exited
> +     * process"; since there is no process, it always "succeeds".

I had to read this three times to figure out what this meant.  Perhaps
I'm being dense but would you mind writing

  +     * NB that we always pass '0' here for the "status of exited
                            ^^^^

> +/*
> + * Destroy all processes of the given uid by setresuid to the
> + * specified uid and kill(-1).  NB this MUST BE CALLED FROM A SEPARATE
> + * PROCESS from the normal libxl process.  Returns a libxl-style error
> + * code.
> + */
> +static int kill_device_model_uid_child(libxl__destroy_devicemodel_state 
> *ddms,
> +                                       const char *dm_uid_str) {
> +    STATE_AO_GC(ddms->ao);
> +    int domid = ddms->domid;
> +    int r, rc = 0;
> +    uid_t dm_uid = atoi(dm_uid_str);
> +
> +    /*
> +     * FIXME: the second uid needs to be distinct to avoid being
> +     * killed by a potential rogue process
> +     */
> +
> +    /*
> +     * Should never happen; but if it does, better to have the
> +     * toolstack crash with an error than nuking dom0.
> +      */
> +    assert(dm_uid);
> +
> +    LOGD(DEBUG, domid, "DM reaper: calling setresuid(%d, %d, 0)",
> +         dm_uid, dm_uid);
> +    r = setresuid(dm_uid, dm_uid, 0);
> +    if (r) {
> +        LOGED(ERROR, domid, "setresuid to (%d, %d, 0)", dm_uid, dm_uid);
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    /*
> +     * And kill everyone but me.
> +     *
> +     * NB that it's not clear from either POSIX or the Linux man page
> +     * that ESRCH would be returned with a pid value of -1, but it
> +     * doesn't hurt to check.
> +     */
> +    r = kill(-1, 9);
> +    if (r && errno != ESRCH) {
> +        LOGED(ERROR, domid, "kill(-1,9)");
> +        rc = ERROR_FAIL;
> +    }
> +
> +out:
> +    return rc;
> +}
> +
> +static void kill_device_model_uid_cb(libxl__egc *egc,
> +                                    libxl__ev_child *destroyer,
> +                                    pid_t pid, int status)
> +{
> +    libxl__destroy_devicemodel_state *ddms = CONTAINER_OF(destroyer, *ddms, 
> destroyer);
> +    STATE_AO_GC(ddms->ao);
> +
> +    if (status) {
> +        int rc = ERROR_FAIL;;
> +
> +        if (WIFEXITED(status))
> +            rc = WEXITSTATUS(status) - 128;

This can't be right.  Where does this 128 come from ?

I don't see a comment anywhere about your encoding of the libxl error
value in the exit status.

A libxl error code does not necessarily fit in an exit status since an
exit status is just one byte.  Your protocol reserves the exit status
0 for success.  The C implementation typically reserves -1 (255) and
sometimes also 127.  By convention the exit status is normally
regarded as positive and libxl error codes are negative.

I suggest one of the following strategies:

  - Give up on the idea of distinguishing these error codes at all and
    simply _exit(!!rc).  (After all the real error is logged and the
    function only ever returns FAIL.)

  - Say that the child function may only return one of a limited
    subset of libxl error codes (since only FAIL is currently needed),
    and assert that -125 <= rc <= -1, and _exit(-rc).  Then the exit
    status can be recovered with -WEXITSTATUS(status).

  - Do something more subtle involving turning out-of-range
    rc values into -126 or some such.  This seems like overkill.

> -    ddms->callback(egc, ddms, rc);
> +    ddms->callback(egc, ddms, ddms->rc);
>  }
> +#undef PROPAGATE_RC

I am tempted to suggest replacing each call
  PROPAGATE_RC;
with
  ACCUMULATE_RC(ddms);
and put the definition in libxl_internal.h for use elsewhere.

I think we would probably already have some other call sites and if we
go and fix the destroy stuff probably we will want a lot more of it.

Up to you.  Like this is certainly fine for now.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.