[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid

George Dunlap writes ("Re: [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid"):
> On Nov 28, 2018, at 5:02 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
> > Also I think you need to handle errors properly ?  Ie set and check
> > errno.
> Don’t I want to pass up the errno values set by the getpwnam functions?

By `set' I meant zero beforehand.  See the manpage for getpw*.

> Although I suppose I should also make sure that the uid I return is not 
> zero...


> > I'm also not convinced that it is sensible to handle the case where we
> > have multiple per-domain uids but no reaper uid.  This turns host
> > configuration errors into systems that are less secure than they
> > should be in a really obscure way.
> At this point, we have a target_uid but no way of getting a lock for 
> reaper_uid.  We have three options:
> 1. Don’t kill(-1) at all.
> 2. Try to  kill(-1) with setresuid(target_uid, target_uid, 0)
> 3. kill(-1) with setresuid(reaper_uid, target_uid, 0) without holding the 
> lock.
> #1 means that a rogue qemu will not be destroyed.
> #2 means that there’s a race, whereby sometimes the rogue qemu is destroyed, 
> and sometimes the reaper process is destroyed by the rogue qemu first.
> #3 means there’s a race, whereby sometimes everything works fine, sometimes 
> both the rogue qemu and *the reaper process from another domain* is 
> destroyed, and sometimes this reaper process is killed by the reaper process 
> from another domain (leaving the rogue qemu alive).
> I think [#2] is obviously the best option.

Yes, but it still needs to count as an error.

> > CODING_STYLE, no call inside the condition please.
> I c&p’d this from libxl_internal.c:libxl__lock_domain_userdata().
> I take it you’re referring to the “ERROR HANDLING” section of CODING_STYLE?  
> It wasn’t obvious to me that refers to while() loops.
> I take it you’d prefer "while(true) { rc = flock(); if(!rc) break; …}” 
> instead?

Except that the return value from flock() is `r' according to
CODING_STYLE.  (And I'm not sure `true' is in scope and anyway
personally I prefer `for (;;)' but that's a matter of taste
so whatever.)

> And if I may make a minor suggestion: This is the second time in this series 
> you’ve said “don’t do X” for fairly common code idioms without giving me 
> guidance as to what you’d like to see instead. 

I'm sorry that `no call inside the condition please' was not clear
enough.  I mean that the flock call should be outside any condition,
because it is a call that might fail, and that consequently the loop
termination will have to be done with an early loop exit using `break'
rather than a condition.  (Since saving the return value so that it
could be used in a while() would be perverse.)

> > * But crucially in such situations (i) overall destroy ao should
> >   return a failure error code (ii) the domain itself should not be
> >   destroyed in Xen.  This means that `xl destroy' fails, and can be
> >   repeated after the problem is corrected.
> This means that in such a situation, we might successfully kill the
> devicemodel, but leave a zombie domain lying around.

Yes, precisely.

> I suppose that might be the least-bad option, as 1) would be more
> likely to alert the administrator to fix the configuration error,
> and 2) the domain would hold the domid, such that any unkilled qemu
> processes wouldn’t be able to cause mischief on other domains.


In more formal terms, they prevent the system getting into a
completely uncontrolled and forbidden state, namely a state where
there is possibly a qemu and maybe other stuff hanging about, which is
not visible in xl and may cause future mischief.

Since we cannot destroy everything at once, our system model must
include a `half destroyed' state.  In that state the domain must still
show up in `xl'.  We have chosen to `hang everything off' the Xen list
of active domains, which means that we mustn't remove a domain from
Xen until we have *successfully* destroyed all its other stuff.

> Probably some of these principles should be in a comment somewhere.

Yes.  I think they apply to all of domain destruction in libxl.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.