|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid
George Dunlap writes ("Re: [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid"):
> On Nov 28, 2018, at 5:02 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
> > Also I think you need to handle errors properly ? Ie set and check
> > errno.
>
> Don’t I want to pass up the errno values set by the getpwnam functions?
By `set' I meant zero beforehand. See the manpage for getpw*.
> Although I suppose I should also make sure that the uid I return is not
> zero...
Yes.
> > I'm also not convinced that it is sensible to handle the case where we
> > have multiple per-domain uids but no reaper uid. This turns host
> > configuration errors into systems that are less secure than they
> > should be in a really obscure way.
>
> At this point, we have a target_uid but no way of getting a lock for
> reaper_uid. We have three options:
>
> 1. Don’t kill(-1) at all.
> 2. Try to kill(-1) with setresuid(target_uid, target_uid, 0)
> 3. kill(-1) with setresuid(reaper_uid, target_uid, 0) without holding the
> lock.
>
> #1 means that a rogue qemu will not be destroyed.
>
> #2 means that there’s a race, whereby sometimes the rogue qemu is destroyed,
> and sometimes the reaper process is destroyed by the rogue qemu first.
>
> #3 means there’s a race, whereby sometimes everything works fine, sometimes
> both the rogue qemu and *the reaper process from another domain* is
> destroyed, and sometimes this reaper process is killed by the reaper process
> from another domain (leaving the rogue qemu alive).
>
> I think [#2] is obviously the best option.
Yes, but it still needs to count as an error.
> > CODING_STYLE, no call inside the condition please.
>
> I c&p’d this from libxl_internal.c:libxl__lock_domain_userdata().
> I take it you’re referring to the “ERROR HANDLING” section of CODING_STYLE?
> It wasn’t obvious to me that refers to while() loops.
> I take it you’d prefer "while(true) { rc = flock(); if(!rc) break; …}”
> instead?
Except that the return value from flock() is `r' according to
CODING_STYLE. (And I'm not sure `true' is in scope and anyway
personally I prefer `for (;;)' but that's a matter of taste
so whatever.)
> And if I may make a minor suggestion: This is the second time in this series
> you’ve said “don’t do X” for fairly common code idioms without giving me
> guidance as to what you’d like to see instead.
I'm sorry that `no call inside the condition please' was not clear
enough. I mean that the flock call should be outside any condition,
because it is a call that might fail, and that consequently the loop
termination will have to be done with an early loop exit using `break'
rather than a condition. (Since saving the return value so that it
could be used in a while() would be perverse.)
> > * But crucially in such situations (i) overall destroy ao should
> > return a failure error code (ii) the domain itself should not be
> > destroyed in Xen. This means that `xl destroy' fails, and can be
> > repeated after the problem is corrected.
>
> This means that in such a situation, we might successfully kill the
> devicemodel, but leave a zombie domain lying around.
Yes, precisely.
> I suppose that might be the least-bad option, as 1) would be more
> likely to alert the administrator to fix the configuration error,
> and 2) the domain would hold the domid, such that any unkilled qemu
> processes wouldn’t be able to cause mischief on other domains.
Precisely.
In more formal terms, they prevent the system getting into a
completely uncontrolled and forbidden state, namely a state where
there is possibly a qemu and maybe other stuff hanging about, which is
not visible in xl and may cause future mischief.
Since we cannot destroy everything at once, our system model must
include a `half destroyed' state. In that state the domain must still
show up in `xl'. We have chosen to `hang everything off' the Xen list
of active domains, which means that we mustn't remove a domain from
Xen until we have *successfully* destroyed all its other stuff.
> Probably some of these principles should be in a comment somewhere.
Yes. I think they apply to all of domain destruction in libxl.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |