[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] docs/qemu-deprivilege: Revise and update with status and future plans



On 03/26/2018 05:43 PM, Ian Jackson wrote:
> Thanks for this update!
> 
> George Dunlap writes ("[PATCH] docs/qemu-deprivilege: Revise and update with 
> status and future plans"):
> ...
>> +# Technical details
>> +
>> +## Restrictions done
> 
> This makes this doc into a mixture of a design doc and a user doc, I
> think.
> 
> It might be worth stating the design intent, which I think is this:
> 
>  * Even if there is a bug (for example in qemu) which permits a domain
>    to compromise the device model, the compromised device model
>    process is prevented from violating the system's overall security
>    properties.  Ie, a guest cannot "escape" from the virtualisation by
>    using a qemu bug.
> 
> This design intent is not yet achieved.  Right now an attacker is
> impeded and their attack is complicated; in some circumstances the
> will be limited to denial of service.
> 
> I'm not sure the individual restrictions need to be in a user-facing
> doc.
> 
> Maybe the user-facing wording from your patch should be moved to
> xl.cfg.doc.5 ?

Actually I think most of the user-facing stuff already in xl.cfg is
inappropriate for that man page.  It might make sense to have a separate
man page for it.

>> +'''Description''': Close and restrict Xen-related file descriptors.
>> +Specifically, make sure that only one `privcmd` instance is open, and
>> +that the IOCTL_EVTCHN_RESTRICT_DOMID ioctl has been called.
>> +
>> +XXX Also, make sure that only one `xenstore` fd remains open, and that
>> +it's restricted.
> 
> No.  Firstly, in each case, all relevant descriptors are restricted.
> This is the purpose of the xentoolcore__restrict_* stuff.  Secondly,
> xenstore *is* covered - but the xs fd is squashed so as to be totally
> unuseable: xs.c uses xentoolcore__restrict_by_dup2_null.

Ross already gave me some corrections on this; here is what I have:

8<---
'''Description''': Close and restrict Xen-related file descriptors.
Specifically:
 * Close all xenstore-related file descriptors
 * Make sure that extraneous `privcmd` and `evtchn` instances are
closed
 * Make sure that all open instances of `privcmd` and `evtchn` file
descriptors have had IOCTL_PRIVCMD_RESTRICT and
IOCTL_EVTCHN_RESTRICT_DOMID ioctls called on them, respectively.
--->8

It sounds like the last may be inaccurate for libxl?

>> +### Namespaces for unused functionality
>> +
>> +'''Descripiton''': Enter QEMU into its own mount & IPC namespaces.
>> +This means that even if other restrictions fail, the process won't be
>> +able to even name system mount points or exsting non-file-based IPC
>> +descriptors to attempt to attack them.
>> +
>> +'''Implementation''':
>> +
>> +In theory this could be done in QEMU (similar to -sandbox, -runas,
>> +-chroot, and so on), but a patch doing this in QEMU was NAKed
>> +upstream. They preferred that this was done as a setup step by
>> +whatever executes QEMU; i.e., have the process which exec's QEMU first
>> +call:
>> +
>> +    unshare(CLONE_NEWNS | CLONE_NEWIPC)
> 
> This would mean we would have to pass qemu fds for both the network
> tap devices and any vnc consoles.  That makes life considerably more
> complicated.  I think we should perhaps revisit this upstream.

You haven't read this very carefully (nor do you seem to have read the
discussion with Ross before responding).  I split the "Namespaces"
restriction Ross suggested internally to us into two parts:
 - Namespace restrictions for unused functionality
 - Network namespacing

This section covers the first one, which will have no impact because the
features restricted are not used.  It can be implemented immediately
with no architectural change to give additional security.

The second one does mean passing fds for network items.  Doing so in
general seems cleaner architecturally.

In fact, I was wondering if it might make sense to do *all* the
deprivileging in a separate process before starting qemu.  That would
make it simple, for example, to write a test program which would try to
break out of the "jail" we'd put it in.

>> +### Further RLIMITs
>> +
>> +RLIMIT_AS limits the total amount of memory; but this includes the
>> +virtual memory which QEMU uses as a mapcache.  xen-mapcache.c already
>> +fiddles with this; it would be straightforward to make it *set* the
>> +rlimit to what it thinks a sensible limit is.
>> +
>> +Other things that would take some cleverness / changes to QEMU to
>> +utilize due to ordering constrants:
>> + - RLIMIT_NPROC (after uid changes to a unique uid)
>> + - RLIMIT_NOFILES (after all necessary files are opened)
> 
> I think there is little difficulty with RLIMIT_NPROC since our qemu
> does not fork.  I think we can set it to a value which is currently
> violated for the current uid ?

Well AFAICT classic POSIX allows you to set rlimits on yourself, but not
on another process.  Since this is on the *user id* rather than the
*process*, I didn't think "setrlimit [as root] -> exec -> setuid" would
work correctly; I assumed you'd have to have "exec -> setuid ->
setrlimit", which would require further changes to QEMU.

I now realize that it might be that the limit will follow the current
uid of the process, in which case "setrlimit -> setuid" might have the
expected behavior.  But a quick Google search shows that the interaction
of RLIMIT_NPROC and setuid() is tricky[1][2], and may vary from
implementation to implementation; relying on the interaction to be
correct (and stay correct) seems somewhat risky (unless POSIX has
explicitly documented what should happen in that case, which again a
quick Google search hasn't turned up).

Linux does seem to have a "set rlimit on another process" system call
(prlimit).  But that would still require at least a little bit of care,
as then we'd need to set the limit after the setuid but before the guest
started running.  And in any case I couldn't (again in a quick search)
discover that FreeBSD has such a system call (and working correctly on
FreeBSD seems to be a design goal).

[1] https://lkml.org/lkml/2003/7/13/226
[2] https://lwn.net/Articles/451985/

>> +### libxl UID cleanup
> ...
>> +kill(-1,sig) sends a signal to "every process to which the calling
>> +process has permission to send a signal".  So in theory:
>> +  setuid(X)
>> +  kill(-1,KILL)
>> +should do the trick.
> 
> We need to check whether a malicious qemu process could kill this
> one.

Hmm, in theory it probably could.  If we do it twice (once at domain
destruction and again at domain creation) it would need to win at least
two races.  I didn't find a more secure way of doing this: Linux at
least doesn't seem to have a "kill all processes with userid X" system
call; the only way to target all processes with a userid was kill(-1),
which targets your own userid (and of course then leaves you open to
being killed).  There may be a way to prevent your own process from
being killed by the other process by playing with EUID / RUID and so on.

The only way to change your pid is to fork() or clone(), right?  If we
get RLIMIT_NPROC=1 working correctly, then this should be a "just in
case" backdrop, as QEMU shouldn't (in theory) be able to change its pid.

>> +### Disks
>> +
>> +The chroot (and seccomp?) happens late enough such that QEMU can
>> +initialize itself and open its disks. If you want to add a disk at run
>> +time via or insert a CD, you can't pass a path because QEMU is
>> +chrooted. Instead use the add-fd QMP command and use
>> +/dev/fdset/<fdset-id> as the path.
> 
> I don't think we (Xen) really support hotplug of emulated disks right
> now.  So it's just cd insert that's a problem.

I might edit it, but as written it's strictly correct: As it happens, we
don't hot-plug emulated devices; but if you wanted to, you'd need to
pass a file descriptor.

But there is a dangling phrase here ("...add a disk at run time via [?]
or insert a CD...").

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.