[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 09/10] libxl: Kill QEMU with "reaper" ruid



George Dunlap writes ("[PATCH v2 09/10] libxl: Kill QEMU with "reaper" ruid"):
> Using kill(-1) to killing an untrusted dm process with the real uid
> equal to the dm_uid isn't guaranteed to succeed: the process in
> question may be able to kill the reaper process after the setresuid()
> and before the kill().
> 
> Instead, set the real uid to the QEMU user for domain 0
> (QEMU_USER_RANGE_BASE + 0).  The reaper process will still be able to
> kill the dm process, but not vice versa.
> 
> This, in turn, requires locking to make sure that only one reaper
> process is using that uid at a time; otherwise one reaper process may
> kill the other reaper process.
> 
> Create a lockfile in RUNDIR/dm-reaper-lock, and grab the lock before
> executing kill.
> 
> In the event that we can't get the lock for some reason, go ahead with
> the kill using dm_uid for both real and effective UIDs.  This isn't
> guaranteed to work, but it's no worse than not trying to kill the
> process at all.

Thanks.  Only minor comments here.

> +/*
> + * Look up "reaper UID".  If present and non-root, returns 0 and sets
> + * reaper_uid.  If not present, returns 0 and leaves reaper_uid unset;
> + * otherwise returns libxl-style error.
> + */

`leaves reaper_uid unset' is ambiguous.  It might mean `leaves it
unchanged' or `leaves it set to a sentinel value'.  The implementation
seems to be the former, which means that all callers must set it to a
sentinel value.

I think it would be better if it explicitly set it to (uid_t)-1
in that case.

...

I looked further and saw that actually you meant `leaves it unchanged'
and the caller is expected to set it to the value that it will want to
use if there is no dedicated reaper uid.  This is rather too subtle
for me.

Can you please put the logic that falls back to
reaper_uid=dm_dedicated_uid closer to the point of use ?

> +        if(user_base->pw_uid == 0) {
             ^
missing space.

.oO{ I wonder how that checkpatch project is coming on... }

> +static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms,
> +                                   uid_t *reaper_uid)
> +{
...
> +    fd = open(lockfile, O_RDWR|O_CREAT, 0666);

I think this mode is wrong.  If the process is running with a g+w
umask the file is g+w which is not desirable.  I suggest 0644.

> +    if (fd < 0) {
> +        /* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */
> +        LOGED(ERROR, domid,
> +              "unexpected error while trying to open lockfile %s, errno=%d",
> +              lockfile, errno);
> +        return ERROR_FAIL;
;4~
`All other errno' - other to what ?  I think this comment has become
detached from its code.

> +    /* Try to lock the file, retrying on EINTR */
> +    for (;;) {
> +        r = flock(fd, LOCK_EX);
> +        if (!r)
> +            break;
> +        if (errno != EINTR) {
> +            /* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */
> +            LOGED(ERROR, domid,
> +                  "unexpected error while trying to lock %s, fd=%d, 
> errno=%d",
> +                  lockfile, fd, errno);
> +            return ERROR_FAIL;
> +        }
> +    }

LGTM.

> +    /*
> +     * Get reaper_uid.  If we can't find such a uid, return an error.
> +     *
> +     * FIXME: This means that domain destruction will fail if
> +     * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist.
> +     */
> +    return libxl__get_reaper_uid(gc, reaper_uid);
> +}

Did you mean to introduce this new FIXME ?  I think device_model_user
was there before your series (even though it probably won't work very
well...)  Forgive me if I am missing something.

If you did mean to, then you need to explain in the commit message why
it is OK.

>  /*
>   * Destroy all processes of the given uid by setresuid to the
>   * specified uid and kill(-1).  NB this MUST BE CALLED FROM A SEPARATE
> - * PROCESS from the normal libxl process.  Returns a libxl-style error
> - * code.
> + * PROCESS from the normal libxl process, and should exit immediately
> + * after return.  Returns a libxl-style error code.

I think this rule that you must _exit immediately after return is not
new so can you move that comment into the patch that introduces the
setresuid ?  Also it would be marginally better to explicitly say
_exit rather than exit.

>      /*
...
> +     * NB: Even if we don't have a separate reaper_uid, the parent can
> +     * know whether we won the race by looking at the status variable;
> +     * so we don't strictly need to return failure in this case.  But
> +     * if there's a misconfiguration, it's better to alert the
> +     * administator sooner rather than later; so if we fail to get a
> +     * reaper uid, report an error even if the kill succeeds.

I approve of this reasoning and also of it being explained in a
comment.  Thanks.

> @@ -2853,7 +2952,7 @@ static int 
> kill_device_model_uid_child(libxl__destroy_devicemodel_state *ddms,
>      r = kill(-1, 9);
>      if (r && errno != ESRCH) {
>          LOGED(ERROR, domid, "kill(-1,9)");
> -        rc = ERROR_FAIL;
> +        rc = rc ? rc : ERROR_FAIL;
>      }

This is a bit open-coded but I guess this is pretty bespoke set of
code.  So, fine.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.