| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid
 George Dunlap writes ("Re: [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid"):
> On Nov 28, 2018, at 5:02 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
> > Also I think you need to handle errors properly ?  Ie set and check
> > errno.
> 
> Don’t I want to pass up the errno values set by the getpwnam functions?
By `set' I meant zero beforehand.  See the manpage for getpw*.
> Although I suppose I should also make sure that the uid I return is not 
> zero...
Yes.
> > I'm also not convinced that it is sensible to handle the case where we
> > have multiple per-domain uids but no reaper uid.  This turns host
> > configuration errors into systems that are less secure than they
> > should be in a really obscure way.
> 
> At this point, we have a target_uid but no way of getting a lock for 
> reaper_uid.  We have three options:
> 
> 1. Don’t kill(-1) at all.
> 2. Try to  kill(-1) with setresuid(target_uid, target_uid, 0)
> 3. kill(-1) with setresuid(reaper_uid, target_uid, 0) without holding the 
> lock.
> 
> #1 means that a rogue qemu will not be destroyed.
> 
> #2 means that there’s a race, whereby sometimes the rogue qemu is destroyed, 
> and sometimes the reaper process is destroyed by the rogue qemu first.
> 
> #3 means there’s a race, whereby sometimes everything works fine, sometimes 
> both the rogue qemu and *the reaper process from another domain* is 
> destroyed, and sometimes this reaper process is killed by the reaper process 
> from another domain (leaving the rogue qemu alive).
> 
> I think [#2] is obviously the best option.
Yes, but it still needs to count as an error.
> > CODING_STYLE, no call inside the condition please.
> 
> I c&p’d this from libxl_internal.c:libxl__lock_domain_userdata().
> I take it you’re referring to the “ERROR HANDLING” section of CODING_STYLE?  
> It wasn’t obvious to me that refers to while() loops.
> I take it you’d prefer "while(true) { rc = flock(); if(!rc) break; …}” 
> instead?
Except that the return value from flock() is `r' according to
CODING_STYLE.  (And I'm not sure `true' is in scope and anyway
personally I prefer `for (;;)' but that's a matter of taste
so whatever.)
> And if I may make a minor suggestion: This is the second time in this series 
> you’ve said “don’t do X” for fairly common code idioms without giving me 
> guidance as to what you’d like to see instead. 
I'm sorry that `no call inside the condition please' was not clear
enough.  I mean that the flock call should be outside any condition,
because it is a call that might fail, and that consequently the loop
termination will have to be done with an early loop exit using `break'
rather than a condition.  (Since saving the return value so that it
could be used in a while() would be perverse.)
> > * But crucially in such situations (i) overall destroy ao should
> >   return a failure error code (ii) the domain itself should not be
> >   destroyed in Xen.  This means that `xl destroy' fails, and can be
> >   repeated after the problem is corrected.
> 
> This means that in such a situation, we might successfully kill the
> devicemodel, but leave a zombie domain lying around.
Yes, precisely.
> I suppose that might be the least-bad option, as 1) would be more
> likely to alert the administrator to fix the configuration error,
> and 2) the domain would hold the domid, such that any unkilled qemu
> processes wouldn’t be able to cause mischief on other domains.
Precisely.
In more formal terms, they prevent the system getting into a
completely uncontrolled and forbidden state, namely a state where
there is possibly a qemu and maybe other stuff hanging about, which is
not visible in xl and may cause future mischief.
Since we cannot destroy everything at once, our system model must
include a `half destroyed' state.  In that state the domain must still
show up in `xl'.  We have chosen to `hang everything off' the Xen list
of active domains, which means that we mustn't remove a domain from
Xen until we have *successfully* destroyed all its other stuff.
> Probably some of these principles should be in a comment somewhere.
Yes.  I think they apply to all of domain destruction in libxl.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |