[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen: arm: Don't use stop_cpu() in halt_this_cpu()



Hi Stefano,

On 24/06/2022 22:31, Stefano Stabellini wrote:
On Fri, 24 Jun 2022, Julien Grall wrote:
On 23/06/2022 23:07, Stefano Stabellini wrote:
On Thu, 23 Jun 2022, dmitry.semenets@xxxxxxxxx wrote:
From: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
So wouldn't it be better to remove the panic from the implementation of
call_psci_cpu_off?

I have asked to keep the panic() in call_psci_cpu_off(). If you remove the
panic() then we will hide the fact that the CPU was not properly turned off
and will consume more energy than expected.

The WFI loop is fine when shutting down or rebooting because we know this will
only happen for a short period of time.

Yeah, I don't think we should hide that CPU_OFF failed. I think we
should print a warning. However, given that we know CPU_OFF can
reasonably fail in normal conditions returning DENIED when a Trusted OS
is present, then I think we should not panic.

We know how to detect this condition (see section 5.9 in DEN0022D.b). So I would argue we should fix it properly rather than removing the panic().


If there was a way to distinguish a failure because a Trusted OS is
present (the "normal" failure) from other failures, I would suggest to:
- print a warning if failed due to a Trusted OS being present
- panic in other cases

Unfortunately it looks like in all cases the return code is DENIED :-(
I am confused. Per the spec, the only reason CPU_OFF can return DENIED is because the Trusted OS is resident. So what other cases are you talking about?



Given that, I would not panic and only print a warning in all cases. Or
we could ASSERT which at least goes away in !DEBUG builds.

ASSERT() is definitely not way to deal with external input. I could possibly accept a WARN(), but see above.

The reason I am saying this is that stop_cpu is called in a number of
places beyond halt_this_cpu and as far as I can tell any of them could
trigger the panic. (I admit they are unlikely places but still.)

This is one of the example where the CPU will not be stopped for a short
period of time. We should deal with them differently (i.e. migrating the
trusted OS or else) so we give all the chance for the CPU to be fully powered.

IMHO, this is a different issue and hence why I didn't ask Dmitry to solve it.

I see your point now. I was seeing the two things as one.

I think it is true that the WFI loop is likely to work. Also it is true
that from a power perspective it makes no different on power down or
reboot.  From that point of view this patch is OK.

But even on shut-down/reboot, why not do that as a fallback in case
CPU_OFF didn't work? It is going to work most of the times anyway, why
change the default for the few cases that doesn't work?

Because we would not be consistent how we would turn off a CPU on a system supporting PSCI. I would prefer to use the same method everywhere so it is easier to reason about.

I am also not sure how you define "most of the time". Yes it is possible that the boards we aware of will not have this issue, but how about the one we don't know?


Given that this patch would work, I don't want to insist on this and let
you decide.


But even if we don't want to remove the panic as part of this patch, I
think we should remove the panic in a separate patch anyway, at least
until someone investigates and thinks of a strategy how to migrate the
TrustedOS as you suggested.
If we accept this patch, then we remove the immediate pain. The other uses of stop_cpu() are in: 1) idle_loop(), this is reachable when turning off a CPU after boot (not supported on Arm) 2) start_secondary(), this is only used during boot (CPU hot-unplug is not supported)

Even if it would be possible to trigger the panic() in 2), I am not aware of an immediate issue there. So I think it would be the wrong approach to remove the panic() first and then investigate.

The advantage of the panic() is it will remind us that some needs to be fixed. With a warning (or WARN()) people will tend to ignore it.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.