[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 4/6] xen/arm: optee: handle shared buffer translation error





On 24/09/2019 14:30, Volodymyr Babchuk wrote:

Julien Grall writes:

On 24/09/2019 12:37, Volodymyr Babchuk wrote:

Hi Julien,

Julien Grall writes:

Hi,

On 18/09/2019 19:51, Volodymyr Babchuk wrote:
+/* Handles return from Xen-issued RPC */
+static void handle_xen_rpc_return(struct optee_domain *ctx,
+                                  struct cpu_user_regs *regs,
+                                  struct optee_std_call *call,
+                                  struct shm_rpc *shm_rpc)
+{
+    call->state = OPTEE_CALL_NORMAL;
+
+    /*
+     * Right now we have only one reason to be there - we asked guest
+     * to free shared buffer and it did it. Now we can tell OP-TEE
+     * that buffer allocation failed. We are not storing exact command
+     * type, only type of RPC return. So, this is the only check we
+     * can perform there.
+     */
+    ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD);

As I pointed out in v1, ASSERT() is probably the less desirable
solution here as this is an error path.
Looks like I misunderstood you :(

Can you explain why you chose that over the 3 solutions I suggested?
I think I need some clarification there. ASSERT() is adopted way to tell
about invariant. Clearly, this is programming error if ASSERT() fails.

That's correct.


But I understand, that ASSERT() is available only in debug build. So, in
release it will never fire. As this is very unlikely error path, bug
there can live forever. Okay, so ASSERT() is the least desirable way.

This is my concern, ASSERT() are fine in path that can be exercised
quite well. By experience, error path as not so well tested, so any
verbosity in non-debug build is always helpful.
Yep, I see.


Warning is not enough, as we are already in incorrect state.
Incorrect state for who? The guest?
Yes, for the current call of the current guest. State of other calls and
other guests should not be affected. But it is possible that our view of
OP-TEE state for that guest is not correct also.


So, best way is to crash guest, it this correct? I'll do this in the
next version then.

A rule of thumb is
   - BUG_ON can be replaced by domain_crash() as this is a fatal error
you can't recover (the scope depends on the function call).
This seems correct.


   - ASSERT() can be replaced by WARN_ON() as the former will be a NOP
in non-debug build. In both case, the error is not fatal and continue
will not result So it means the error is not fatal.
I can't agree with this in general. But maybe this makes sense for Xen.
As I said, generally ASSERT() is used to, well, assert that condition is
true for any correct state of a program. So it cant' be replaced with
WARN_ON(). If we detected invalid state we should either try to correct
it (if know how) or to immediately stop the program.

I agree that assert are here to catch that any condition is true. However, as I pointed out, ASSERTs are turned to NOP in production build. So if there were a problem in that path, you would have happily continued with the error.

In other words, ASSERT() is only here to help you catch any mistake during development by crashing the hypervisor so you get information on what happen. But your code should be able to cope of any mistake in production build (or you know this can't happen thanks to code coverage).

This is very different from domain_crash() where you know that your code is not able to cope with it and you don't want the guest to continue. Note that domain_crash() is asynchronous, so you will still execute some part of the hypervisor (until leave_hypervisor_head() is called).


But I can see, why this behavior is not desired for
hypervisor. Especially in production use.

You used ASSERT() in your code, so it feels to me that WARN_ON() would
be a suitable replacement.
Well, honestly I believe that it is better to crash guest to prevent
any additional harm. Look, we already detected that something is wrong
with mediator state for certain guest. We can pretend that all is fine
and try to continue the call. But who knows which consequences it will have?

However, if the state inconsistency is for Xen, then domain_crash() is
the best. If this is for the guest, then this is not really our
business and it may be best to let him continue as it could provide
more debug (domain_crash() will only dump the stack and registers).
I'm looking at this from different point: we promised to provide some
service for a guest and screwed up. It is not guest's fault. Now we know
that we can't provide reliable service for a guest anymore. From
safety point of view we should shut down the service. (But this is job
for another patch) For now, we at least should crash the
guest. This is the safest way. What do you think?

I am happy with domain_crash().

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.