[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 08/10] libxl: Kill QEMU by uid when possible
George Dunlap writes ("Re: [PATCH v2 08/10] libxl: Kill QEMU by uid when possible"): > On 12/12/18 4:17 PM, Ian Jackson wrote: > > I am tempted to suggest replacing each call > > PROPAGATE_RC; > > with > > ACCUMULATE_RC(ddms); > > and put the definition in libxl_internal.h for use elsewhere. > > I like that better. What about `ACCUMULATE_RC(ddms->rc)` instead? Then > the same macro could be used for a local variable. Yes, SGTM. > >> + /* > >> + * If there was an error here, accumulate the error and fall back > >> + * to killing by pid. > >> + */ > >> + if (rc) { > >> + PROPAGATE_RC; > >> + LOGD(ERROR, domid, "Reading dm UID path failed for %s", path); > >> + } > > > > From the comment for libxl__xs_read_checked: > > | * On error, *result_out is undefined. > > Arguably this is a bear trap. Maybe you would like to fix it there > > rather than by setting dm_uid_str to 0 here. > > Saying it's "undefined" is probably a bear trap. But you don't like the > way it's actually written -- i.e., that the pointer is only modified if > the value is successfully read? I think that's a weirdly complicated interface with strange implications for normal callers. I dislike threading the logic through like this. By "fix it there" I meant change libxl__xs_read_checked to always set it to 0 on error, and to promise to do that. Sorry not to be explicit. IOW I don't really think libxl__xs_read_checked ought to expect to be used in an accumulate-y way, especially since it only leaves you with the previous value on errors you probably weren't expecting, and not on ENOENT. > >> + if (!reaper_pid) { /* child */ > >> + rc = kill_device_model_uid_child(ddms, dm_uid_str); > >> + _exit(rc); > >> + } > > > > You cannot _exit(rc). See my comments below... > > Is the 'not' attached to "rc" (i.e., the value must be positive), or > '_exit()' (i.e., you must call exit() rather than _exit())? To rc. > >> +static void kill_device_model_uid_cb(libxl__egc *egc, > >> + libxl__ev_child *destroyer, > >> + pid_t pid, int status) > >> +{ > >> + libxl__destroy_devicemodel_state *ddms = CONTAINER_OF(destroyer, > >> *ddms, destroyer); > >> + STATE_AO_GC(ddms->ao); > >> + > >> + if (status) { > >> + int rc = ERROR_FAIL;; > >> + > >> + if (WIFEXITED(status)) > >> + rc = WEXITSTATUS(status) - 128; > > > > This can't be right. Where does this 128 come from ? > > Looks like I was trying to figure out how WEXITSTATUS worked from > looking at the child of devices_destroy_cb() put in _exit() and how > domain_destroy_domid_cb() interpreted it: Namely, I inferred that > _exit(-1) would get you WEXITSTATUS(status) == 127, and extrapolated > that -2 would get you 126, and so on. But _exit(-1) won't get you WEXITSTATUS(status) == 127. Also, there is a manual. But it will be easier if I state what happens: * Only the bottom byte (2's complement) fo the argument to exit() or _exit() is used. This value is called the `exit status'. It is a common idiom to write exit(255) as exit(-1). * The resulting wait status (what you get from waitpid) is exit_status << 8. * If your died due to an uncaught signal, the wait status is the signal number. Plus 128 if the program dumped core. * HOWEVER a parent may experience other wait statuses: * The dynamic linker or other parts of the runtime may sometimes cause your program to exit nonconsensually, perhaps before it even ran any of your code. By convention it will (if it can) print a message to stderr, and use the exit status 255 or 127. * If a shell is involved, the rules are weirdly broken. When a process run by a shell terminates, the shell squashes either the top or bottom byte of the wait status into $?. That is, $? is now either the exit status, or the termination signal, or the termination signal plus 128. Typically if a shell script exits due to failure of a program it ran, it will call exit on $?. So signal numbers can end up showing up as exit statuses. * Kernels sometimes send processes SIGKILL due to OOM. And of course there are a lot of other signals that might arise for various reasons only very loosely within the program's control. * I am shy of exit status 126 due to what I think is a well-founded superstition that someone may steal it for something. * This stuff with the wait status is the same everywhere that's actually a dialect of Unix but macros have been provided WIFEXITED if true, you may use WEXITSTATUS WIFSIGNALED if true, you may use WTERMSIG WCOREDUMP <- missing on some ancient proprietary unices and as I am a careful programmer I usually handle the remaining case (by printing something like `unexpected wait status %x') too. (There are other macros WIFSTOPPED etc. which I am not discussing here, which arise if you ask wait* to tell you when your child stops not just when it ends.) > > I suggest one of the following strategies: > > > > - Give up on the idea of distinguishing these error codes at all and > > simply _exit(!!rc). (After all the real error is logged and the > > function only ever returns FAIL.) > > > > - Say that the child function may only return one of a limited > > subset of libxl error codes (since only FAIL is currently needed), > > and assert that -125 <= rc <= -1, and _exit(-rc). Then the exit > > status can be recovered with -WEXITSTATUS(status). > > For some reason, I really don't want to drop the exit code. If you'd > rather I do #1 I will, but given the choice I'd go with #2. OK. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |