[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen: arm: Don't use stop_cpu() in halt_this_cpu()



Hi Stefano and Julien,
What is the conclusion about this patch?

сб, 25 июн. 2022 г. в 17:45, Julien Grall <julien@xxxxxxx>:
>
> Hi Stefano,
>
> On 24/06/2022 22:31, Stefano Stabellini wrote:
> > On Fri, 24 Jun 2022, Julien Grall wrote:
> >> On 23/06/2022 23:07, Stefano Stabellini wrote:
> >>> On Thu, 23 Jun 2022, dmitry.semenets@xxxxxxxxx wrote:
> >>>> From: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
> >>> So wouldn't it be better to remove the panic from the implementation of
> >>> call_psci_cpu_off?
> >>
> >> I have asked to keep the panic() in call_psci_cpu_off(). If you remove the
> >> panic() then we will hide the fact that the CPU was not properly turned off
> >> and will consume more energy than expected.
> >>
> >> The WFI loop is fine when shutting down or rebooting because we know this 
> >> will
> >> only happen for a short period of time.
> >
> > Yeah, I don't think we should hide that CPU_OFF failed. I think we
> > should print a warning. However, given that we know CPU_OFF can
> > reasonably fail in normal conditions returning DENIED when a Trusted OS
> > is present, then I think we should not panic.
>
> We know how to detect this condition (see section 5.9 in DEN0022D.b). So
> I would argue we should fix it properly rather than removing the panic().
>
> >
> > If there was a way to distinguish a failure because a Trusted OS is
> > present (the "normal" failure) from other failures, I would suggest to:
> > - print a warning if failed due to a Trusted OS being present
> > - panic in other cases
> >
> > Unfortunately it looks like in all cases the return code is DENIED :-(
> I am confused. Per the spec, the only reason CPU_OFF can return DENIED
> is because the Trusted OS is resident. So what other cases are you
> talking about?
>
> >
> >
> > Given that, I would not panic and only print a warning in all cases. Or
> > we could ASSERT which at least goes away in !DEBUG builds.
>
> ASSERT() is definitely not way to deal with external input. I could
> possibly accept a WARN(), but see above.
>
> >>> The reason I am saying this is that stop_cpu is called in a number of
> >>> places beyond halt_this_cpu and as far as I can tell any of them could
> >>> trigger the panic. (I admit they are unlikely places but still.)
> >>
> >> This is one of the example where the CPU will not be stopped for a short
> >> period of time. We should deal with them differently (i.e. migrating the
> >> trusted OS or else) so we give all the chance for the CPU to be fully 
> >> powered.
> >>
> >> IMHO, this is a different issue and hence why I didn't ask Dmitry to solve 
> >> it.
> >
> > I see your point now. I was seeing the two things as one.
> >
> > I think it is true that the WFI loop is likely to work. Also it is true
> > that from a power perspective it makes no different on power down or
> > reboot.  From that point of view this patch is OK.
> >
> > But even on shut-down/reboot, why not do that as a fallback in case
> > CPU_OFF didn't work? It is going to work most of the times anyway, why
> > change the default for the few cases that doesn't work?
>
> Because we would not be consistent how we would turn off a CPU on a
> system supporting PSCI. I would prefer to use the same method everywhere
> so it is easier to reason about.
>
> I am also not sure how you define "most of the time". Yes it is possible
> that the boards we aware of will not have this issue, but how about the
> one we don't know?
>
> >
> > Given that this patch would work, I don't want to insist on this and let
> > you decide.
> >
> >
> > But even if we don't want to remove the panic as part of this patch, I
> > think we should remove the panic in a separate patch anyway, at least
> > until someone investigates and thinks of a strategy how to migrate the
> > TrustedOS as you suggested.
> If we accept this patch, then we remove the immediate pain. The other
> uses of stop_cpu() are in:
>         1) idle_loop(), this is reachable when turning off a CPU after boot
> (not supported on Arm)
>          2) start_secondary(), this is only used during boot (CPU
> hot-unplug is not supported)
>
> Even if it would be possible to trigger the panic() in 2), I am not
> aware of an immediate issue there. So I think it would be the wrong
> approach to remove the panic() first and then investigate.
>
> The advantage of the panic() is it will remind us that some needs to be
> fixed. With a warning (or WARN()) people will tend to ignore it.
>
> Cheers,
>
> --
> Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.