[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen: arm: Don't use stop_cpu() in halt_this_cpu()



On Fri, 24 Jun 2022, Julien Grall wrote:
> On 23/06/2022 23:07, Stefano Stabellini wrote:
> > On Thu, 23 Jun 2022, dmitry.semenets@xxxxxxxxx wrote:
> > > From: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
> > So wouldn't it be better to remove the panic from the implementation of
> > call_psci_cpu_off?
> 
> I have asked to keep the panic() in call_psci_cpu_off(). If you remove the
> panic() then we will hide the fact that the CPU was not properly turned off
> and will consume more energy than expected.
> 
> The WFI loop is fine when shutting down or rebooting because we know this will
> only happen for a short period of time.

Yeah, I don't think we should hide that CPU_OFF failed. I think we
should print a warning. However, given that we know CPU_OFF can
reasonably fail in normal conditions returning DENIED when a Trusted OS
is present, then I think we should not panic.

If there was a way to distinguish a failure because a Trusted OS is
present (the "normal" failure) from other failures, I would suggest to:
- print a warning if failed due to a Trusted OS being present
- panic in other cases

Unfortunately it looks like in all cases the return code is DENIED :-(


Given that, I would not panic and only print a warning in all cases. Or
we could ASSERT which at least goes away in !DEBUG builds.


> > The reason I am saying this is that stop_cpu is called in a number of
> > places beyond halt_this_cpu and as far as I can tell any of them could
> > trigger the panic. (I admit they are unlikely places but still.)
> 
> This is one of the example where the CPU will not be stopped for a short
> period of time. We should deal with them differently (i.e. migrating the
> trusted OS or else) so we give all the chance for the CPU to be fully powered.
> 
> IMHO, this is a different issue and hence why I didn't ask Dmitry to solve it.

I see your point now. I was seeing the two things as one.

I think it is true that the WFI loop is likely to work. Also it is true
that from a power perspective it makes no different on power down or
reboot.  From that point of view this patch is OK.

But even on shut-down/reboot, why not do that as a fallback in case
CPU_OFF didn't work? It is going to work most of the times anyway, why
change the default for the few cases that doesn't work?

Given that this patch would work, I don't want to insist on this and let
you decide.


But even if we don't want to remove the panic as part of this patch, I
think we should remove the panic in a separate patch anyway, at least
until someone investigates and thinks of a strategy how to migrate the
TrustedOS as you suggested.


 
 
> > Also the PSCI spec explicitely mention CPU_OFF as a way to place CPUs in
> > a "known state" and doesn't offer any other examples. So although what
> > you wrote in the commit message is correct, using CPU_OFF seems to be
> > the less error prone way (in the sense of triggering firmware errors) to
> > place CPUs in a known state.
> 
> The section you are referring to is starting with "One way". This imply there
> are others methods.
> 
> FWIW, the spin loop above seems to be how Linux is dealing with the
> shutdown/reboot.
> 
> > 
> > So I would simply remove the panic from call_psci_cpu_off, so that we
> > try CPU_OFF first, and if it doesn't work, we use the WFI loop as
> > backup. Also we get to fix all the other callers of stop_cpu this way.
> This reads strange. In the previous paragraph you suggested the CPU off is a
> less error prone way to place CPUs in a known state. But here, you are
> softening the stance and suggesting to fallback to the WFI loop.
> 
> So to me this indicates that WFI loop is fine. Otherwise, wouldn't the user
> risk to see firmware errors (which BTW, I don't understand what sort of
> firmware errors you are worried)? If yes, why would it be acceptable?
> 
> For instance, Dmitry situation, the CPU0 would always WFI loop...




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.