[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 0/4] TEE mediator framework + OP-TEE mediator



Hi,

On 23/10/17 21:11, Volodymyr Babchuk wrote:
On Mon, Oct 23, 2017 at 05:59:44PM +0100, Julien Grall wrote:

Hi Volodymyr,
Hi Julien,

Let me begin the e-mail with I am not totally adversed to putting the TEE
mediator in Xen. At the moment, I am trying to understand the whole picture.
Thanks for clarification. This is really reassuring :)
In my turn, I'm not totally against TEE mediators in stubdoms. I'm only
concerned about required efforts.

On 20/10/17 18:37, Volodymyr Babchuk wrote:
On Fri, Oct 20, 2017 at 02:11:14PM +0100, Julien Grall wrote:
On 17/10/17 16:59, Volodymyr Babchuk wrote:
On Mon, Oct 16, 2017 at 01:00:21PM +0100, Julien Grall wrote:
On 11/10/17 20:01, Volodymyr Babchuk wrote:
I want to present TEE mediator, that was discussed earlier ([1]).

I selected design with built-in mediators. This is easiest way,
it removes many questions, it is easy to implement and maintain
(at least I hope so).

Well, it may close the technical questions but still leave the security
impact unanswered. I would have appreciated a summary of each approach and
explain the pros/cons.
This is the most secure way also. In terms of trust between guests and
Xen at least. I'm worked with OP-TEE guys mostly, so when I hear about
"security", my first thoughts are "Can TEE OS trust to XEN as a
mediator? Can TEE client trust to XEN as a mediator?". And with
current approach answer is "yes, they can, especially if XEN is a part
of a chain of trust".

But you probably wanted to ask "Can guest compromise whole system by
using TEE mediator or TEE OS?". This is an interesting question.
First let's discuss requirements for a TEE mediator. So, mediator
should be able to:

  * Receive request to handle trapped SMC. This request should include
    user registers + some information about guest (at least domain id).
  * Pin/unpin domain memory pages.
  * Map domain memory pages into own address space with RW access.
  * Issue real SMC to a TEE.
  * Receive information about guest creation and destruction.
  * (Probably) inject IRQs into a domain (this can be not a requester domain,
    but some other domain, that also called to TEE).

This is a minimal list of requirements. I think, this should be enough to
implement mediator for OP-TEE. But I can't say for sure for other TEEs.

Let's consider possible approaches:

1. Mediator right in XEN, works at EL2.
    Pros:
     * Mediator can use all XEN APIs
     * As mediator resides in XEN, it can be checked together with XEN
       for a validity (trusted boot).
     * Mediator is initialized before Dom0. Dom0 can work with a TEE.
     * No extra context switches, no special ABI between XEN and mediator.

    Cons:
     * Because it lives in EL2, it can compromise whole hypervisor,
       if there is a security bug in mediator code.
     * No support for closed source TEEs.

Another cons is you assume TEE API is fully stable and will not change.
Imagine a new function is added, or a vendor decided to hence with a new set
of API. How will you know Xen is safe to use it?
With whitelisting, as you correctly suggested below. XEN will process
only know requests. Anything that looks unfimiliar should be rejected.

Let's imagine the guest is running on a platform with a newer version of
TEE. This guest will probe the version of OP-TEE and knows the new function
is present.
This request will be handled mediator. At this moment, OP-TEE client does
not use versions. Instead it uses capability flags. So, mediator should
filter all unknown caps. This will force guest to use only supported
subset of features.

One more question. Does it mean new functions will never be added in current capabilities?

If, in the future, client will relly on versions (i.e. due to dramatic
protocol change), mediator can either downgrade version or refuse to work
at all.

Makes sense.


If as you said Xen is using a whitelist, this means the hypervisor will
return unimplemented.
How do you expect the guest to behave in that case?
As I said above, guest should downgrade to supported features subset.

Note that I think a whitelist is a good idea, but I think we need to think a
bit more about the implication.
At least now OP-TEE is designed in a such way, that it is compatible in both
ways. I'm sure that future OP-TEE development will be done with virtualization
support in mind, so it will not break existing setups.

It would be good to have the two communities talking together. So we can make sure the virtualization support is not going in the wrong direction.

Similarly, it would be nice that someone from the OP-TEE maintainers give feedback on the approach suggested in Xen.



If it is not safe, this means you have a whitelist solution and therefore
tie Xen to a specific OP-TEE version. So if you need to use a new function
you would need to upgrade Xen making the code of using new version
potentially high.
Yes, any ABI change between OP-TEE and its clients will require mediator
upgrade. Luckilly, OP-TEE maintains ABI backward-compatible, so if you'll
install old XEN and new OP-TEE, OP-TEE will use only that subset of ABI,
which is known to XEN.

Also, correct me if I am wrong, OP-TEE is a BSD 2-Clause. This means you
impose anyone wanted to modify OP-TEE for their own purpose can make a
closed version of the TEE. But if you need to introspect/whitelist call, you
impose the vendor to expose their API.
Basically yes. Is this bad? OP-TEE driver in Linux is licensed under GPL v2.
If vendor modifies interface between OP-TEE and Linux, they anyways obligued
to expose API.

Pardon me for potential stupid questions, my knowledge of OP-TEE is limited.

My understanding is the OP-TEE will provide a generic way to access
different Trusted Application. While OP-TEE API may be generic, the TA API
is custom. AFAICT the latter is not part of Linux driver.
Yes, you are perfectly right there.

So here my questions:
        1) Are you planning allow all the guests to access every Trusted
Applications?
This is a good question. There are two types of TAs supported in
OP-TEE: real TAs (as they are described in GlobalPlatform specs) and
PseudoTAs.  The latter ones are statically linked right into OP-TEE
kernel and execute at S-EL1 level.
Real TAs are provided by client. That means that NW userspace
supplicant loads TA into OP-TEE. OP-TEE checks signature for the TA
and then runs it in S-EL0.
So, I'm planning to allow client to work with any real TA. I can't see
real problem there.

Are the real TAs going to be shared between guests? Or will each guest have their own one?

Will you allow every guests loading real TAs?

PseudoTAs can be used to access some platform-specific features, and thus
it can be quite dangerous to allow anyone call them.
But, generic OP-TEE includes only test and benchmark PseudoTAs, that
should be disabled on production builds. So, I don't see why generic
mediator should distinguish them. I think, XSM can be employed later
to control which guest can access which PseudoTA. But this is not
target for first version.

I guess the first version will forbid access to PseudoTA from all the guests but Dom0?


        2) Will you ever need to introspect those messages?
No, I don't.

I guess that's because all the TAs should followed the specified message protocol?



2. Mediator in a stubdomain. Works at EL1.
    Pros:
     * Mediator is isolated from hypervisor (but it still can do potentially
       dangerous things like mapping domain memory or pining pages).
     * One can legally create and use mediator for a closed-source TEE.

        * Easier to upgrade to a new version of OP-TEE.
Yes, this is true. But what about interface between XEN and mediator?
This is a new entity that should be maintained. Will I abe able to use
new XEN with old mediator? Or new mediator with old XEN?

Why would you need to specific interface for the mediator? (see more below)
At least following features in XEN control (I hope this is right term) API
are missing right now:
  - domain creation/destruction hooks
  - ability to intercept only certain SMCs
  - way to inject IRQs to other guests

Also, see more below

     Cons:
     * Overhead in XEN<->Mediator communication.
     * XEN needs to be modified to boot mediator domain before Dom0.

Is it a really cons? In the past, we had discussion to allow Xen creating
multiple domain, avoiding the overhead of Dom0. This could also benefits
here.
As I understand, this is a significant change in XEN. What are the chances,
that community will accept this change? As I can see, immediate benefit
of this is only TEE mediator support. Looks like no one except us
interested in this topic.

The GSOC project was not added because of TEE mediator. We had companies
showing interest to start multiple domains at the same time. This would
significantly shrink down the boot time of the whole platform.
Yes. Actually, we also interested in a faster boot. But my point was
that what we need for mediator is not the same that is described in
GSOC project. Functionality described at GSOC page has multiple uses.
But for mediator we need something more intricate: as I said below,
ability to delay boot of hwdom (and other domains).

Not really, you could the domain could block when issuing an SMC until the mediator is up and running.



BTW, I checked "Xen on ARM: create multiple guests from device
tree" at [1]. This is close, to what we need, but not exactly. You see,
TEE mediator should be created *before* Dom0. So actually TEE mediator
will receive domid 0. I suspect that this only change will break
many things.

Can you please give example?
I'm sure that I seen checks for domid == 0 before, but now I can't find any.
Probably, that was closed-source backends. So, sorry for false accusation :)

Technically none of the hypervisor, Linux and the toolstack should rely on
dom0 to be domid 0.

AFAIK, the hypervisor and Linux are free of them. It might be possible to
have few hardcoded in the toolstack, but they should really disappear.
Totaly agree there.

However, I can't see why you require the mediator to use domid 0. You could
for example keep the hardware domain paused until the mediator has started.
So this will like: construct dom0, construct and run mediator domain,
run dom0 by signal from DomMediator? Probably this will work.



And yes, it seems obvious, but I want to say this explicitly: generic
TEE mediator framework should and will use XSM to control which domain
can work with TEE. So, if you don't trust your guest - don't let it
to call TEE at all.

Correct me if I am wrong. TEE could be used by Android guest which likely
run the user apps... right? So are you saying you fully trust that guest and
obviously the user installing rogue app?
I don't think that app downloaded from Play Marget can access OP-TEE directly.
OP-TEE can be used by Android itself as a key storage or to access to a SE,
for example. But 3rd app that issues TEE calls... I don't think so.

You didn't get my point here. That rogue app may be able to break into
kernel via an exploit or have enough privilege to break the guest. Who knows
what it will be able to do after...
Only what hypervisor and TEE will allow it to do. Look, OP-TEE was not designed
to rule the machine. There is ARM TF for that :) OP-TEE's task is to provide
some safer environment for sensitive data and code. This environment has
well-defined interfaces and is desgined to be as safe as possible.

If rogue app breaks into kernel, then it can issue any SMC which it wants.
But OP-TEE does not trust to NW. Hypervisor does not trust to guests.
Mediator should be written in the same way.

So, what can do rogue kernel? As I know - it can cause DoS in OP-TEE. This is
known issue. If there is a security bug in OP-TEE, it probably can overcome
whole system. But this is true for any system running OP-TEE.

I agree that if you take over OP-TEE, you will take over any system. This is not specific to hypervisor.

Baremetal OS taking down the platform will only harm itself. A guest OS could harm the whole platform.

What I am not sure yet, maybe because of my lack of knowledge around OP-TEE, who is going to protect a TA to access all the NS memory?


If there is a security flaw in mediator - it can compromise either hypervisor,
or DomMediator and all TEE-capable guests. Yes, this is a risk.

The whole point of using an hypervisor is to isolate guest from each other.
So what is the isolation model with OP-TEE and the mediator?
OP-TEE is written to isolate TAs, resources and clients from each other.
Currently there are no plans for interaction between TAs from different VMs,
no resource sharing, nothing like this.
What do you mean under "isolation model"? Can you give some example?

By that I meant, who is going to prevent guest A to access guest B data. I think you partly answered to my question by the "OP-TEE is written to isolate TAs". The access to NS memory question above will fill the rest I think.



This feature is not implemented in this RFC only because
currently only Dom0 calls are supported.

This would help to understand that maybe it is an easy way but also still
secure...
In previous discussion we considered only two variants: in XEN or outside
XEN. Stubdomain approach looks more secure, but I'm not sure that it is true.
Such stubdomain will need access to all guests memory. If you managed to
gain control on mediator stubdomain, you can do anything you want with all
guests.

That's slightly untrue. The stubdomain will only be able to mess with
domains using TEE.
Yes, this is more strict. Then either you are not allowing your privileged
domain to use TEE, or your system may be compromised anyways.

Can you give an example of privilege domain for you? Do you consider Android
a privilege domain?
In this case I used term "priviliged domain" in XEN meaning: is_privileged == 1.
Android is not privileged domain, by all means.
I wanted to say that you if you allow Dom0 to access TEE, then hacked 
DomMediator
can compromise Dom0 and the hypervisor.

And I never disagreed in that. This is the non-controversial part :).



To be clear, this series don't look controversial at least for OP-TEE. What
I am more concerned is about DomU supports.
Your concern is that rogue DomU can compromise whole system, right?

Yes. You seem to assume that DomU using TEE will always be trusted, I think
this is the wrong approach if the use is able to interact directly with
those guests. See above.
No, I am not assuming that DomU that calls TEE should be trusted. Why do you
think so? It should be able to use TEE services, but this does not mean that
XEN should trust it.

In a previous answer you said: "So, if you don't trust your guest - don't
let it". For me, this clearly means you consider that DomU using TEE are
trusted.

So can you clarify by what you mean by trust then?
Well... In real world "trust" isn't binary option. You don't want to
allow all domains to access TEE. Breached TEE user domain doesn't
automatically mean that your whole system is compromised. But this
certainly increases attack surface. So it is safer to give TEE access
only to those domains, which really require it. You can call them
sligtly more trusted, then others.

Do you have an example of guest you would slightly trust more?


Even now, XEN processes requests from DomUs without
trusting them. Why do you think, that TEE mediator usage will differ?

I guess you are comparing with vGIC and PL011? IHMO, the main difference is
Xen is taking care alone of the isolation between guest. Here in the TEE
case, you rely on a combination of both TEE and Xen to do the isolation.
Yes. This is will be less secure, than TEE-only or hypervisor-only system.

Can you expand here?



Look, I generally not against idea of TEE mediator in stubdoms. But this
approach require many changes in existing XEN code:

1. Load domains before Dom0.

2. Add special API for mediator. Or alter existing ones. You can't use
    existing APIs as it, because you need to enforce stricter XSM rules
    on them.

Mind giving more explanation....? Xen has a default policy for XSM and
indeed may not fit your use case. But you can write your own policy and load
it.
Yes. You need policy "allow this stubdom to map memory only from TEE-enabled
guests". AFAIK, this is not possible right now. But I can be wrong, I'm
not very familiar with XSM.

I believe XSM could do that. IIRC, you can "label" your domain and use that to say "stubdom is allowed to access memory with domain using the given label".



3. Changes in scheduling to allow TEE mediator use credits/slices of
    calling guest.

4. Support boilerplate code in stubdom. You know, you can't simply
    write mediator in stubdom. You need a kernel. You need to
    maintain it.

Well, in a way or another someone will have to maintain the mediator... The
kernel does not need to be specific to TEE, it could be a unikernel.
Right. But for me XEN looks better maintained "kernel" :)
IMHO, XEN is mature, there are less bugs (especially security ones)
than in any other kernel.

And before you say again no-one in the community seem to be interested. I
should remind you that Arm is working on it (see development update).
You are talking about that "unicore" project by NEC guys? Sorry,
can't find mentioned development update. Looks like search on markmail
is down (or I'm doing something terribly wrong).

Sorry, I meant Mini-OS. I don't know any work on "unicore" for Arm64 for now.



This is a lot of a work. It requires changes in generic parts of XEN.
I fear it will be very hard to upstream such changes, because no one
sees an immediate value in them. How do you think, what are my chances
to upstream this?

It is fairly annoying to see you justifying back most of this thread with
"no one sees an immediate value in them".

I am not the only maintainers in Xen, so effectively can't promise whether
it is going to be upstreamed. But I believe the community has been very
supportive so far, a lot of discussions happened (see [2]) because of the
OP-TEE support. So what more do you expect from us?
I'm sorry, I didn't mean to offend you or someone else. You, guys, can
be harsh sometimes, but I really appreciate help provided by the
community. And I, certainly, don't ask you about any guarantees or
something of that sort.

I'm just bothered by amount of required work and by upstreaming
process. But this is not a strong argument against mediators in
stubdoms, I think :)

Currently I'm developing virtualization support in OP-TEE, so in
meantime we'll have much time to discuss mediators and stubdomain
approach (if you have time). To test this feature in OP-TEE I'm
extending this RFC, making optee.c to look like full-scale mediator.
I need to do this anyways, to test OP-TEE. When I'll finish, I can
show you how mediator can look like. Maybe this will persuade you to
one or another approach.

I think this would be useful. Can you also keep both Stefano (I assume he wants too) and I in the loop for the OP-TEE virtualization side?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.