[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen: Don't call panic if ARM TF cpu off returns DENIED



Hi Julien,
> >>
> >> Hi,
> >>
> >> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
> >>> Julien Grall <julien@xxxxxxx> writes:
> >>>
> >>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> >>>>> Hi Julien,
> >>>>
> >>>> Hi Volodymyr,
> >>>>
> >>>>> Julien Grall <julien@xxxxxxx> writes:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 16/06/2022 14:55, dmitry.semenets@xxxxxxxxx wrote:
> >>>>>>> From: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
> >>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
> >>>>>>
> >>>>>> I am confused. The spec is talking about Trusted OS and not
> >>>>>> firmware. The docummentation is also not specific to ARM Trusted
> >>>>>> Firmware. So did you mean "Trusted OS"?
> >>>>> It should be "firmware", I believe.
> >>>>
> >>>> Hmmm... I couldn't find a reference in the spec suggesting that
> >>>> CPU_OFF could return DENIED because of the firmware. Do you have a
> >>>> pointer to the spec?
> >>>
> >>> Ah, looks like we are talking about different things. Indeed, CPU_OFF
> >>> can return DENIED only because of Trusted OS. But entity that *returns*
> >>> the error to a caller is a firmware.
> >>
> >> Right, the interesting part is *why* DENIED is returned not *who*
> >> returns it.
> > ARM TF returns DENIED *only* for the platform I have.
> > We have a dissonance between spec and xen implementation because
> > DENIED returned by
> > ARM TF or Thrusted OS or whatever is not a reason for panic.
>
> I agree that's not a reason for panic. However, knowing the reason does
> help to figure out the correct approach.
>
> For instance, one could have suggest to migrate the trusted OS to
> another pCPU. But this would not have worked for you because the DENIED
> is not about that.
>
> > And we
> > have issues with this.
> > If machine_restart() behaviour is more or less correct  (sometimes
> > reports about panic but restarts the machine)
>
> Right...
>
> > but machine_halt() doesn't work at al
> ... this should also be the case here because machine_halt() could also
> be called from cpu0. So I am a bit confused why you are saying it never
> works.
If machine_halt() called on a CPU other than CPU0 it caused panic and reboot.
If it called on a CPU0 it also caused panic but after system power off
and reboot
is not issued. In this state you can still call the xen console. But
you can't reboot the system.
>
> > Transit execution to CPU0 for my understanding is a workaround and
> > this approach will fix
> > machine_restart() function but will not fix machine_halt().
>
> I would say it is a more specific case of what the spec suggests (see
> below). But it should fix both machine_restart() and machine_halt()
> because the last CPU running will be CPU0. So Xen would call SYSTEM_*
> rather than CPU_OF. So I don't understand why you think it will fix one
> but not the other.
Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop
the whole system.
>
> In fact, the idea to always run the request from a given CPU is quite
> similar to what the specification suggests (5.10.3 DEN0022D.b):
>
> "
> One way in which cores can be placed into a known state is to use calls
> to CPU_OFF on all online cores
> except for the last one, which instead uses SYSTEM_OFF. If a UP Trusted
> OS is present, this method
> only works if the core that calls SYSTEM_OFF is the one where the
> Trusted OS is resident, as calls to
> CPU_OFF on this core return a DENIED error. Any core can call SYSTEM_OFF.
> "
>
> For Xen, we would need to detect if the trusted OS is UP and where it is
> running. Then we could always restart/halt from that CPU or CPU0.
>
> > Approach
> > you suggested (spinning all cpus) will work but
> > will save less energy.
>
> I am not sure to understand what's the concern about the energy here.
>  From my understanding of the specification, SYSTEM_OFF will take care
> of switching off the power for all the cores. So at worse, the CPUs will
> spin for a few ms. This would like be more efficient than a call to PSCI
> CPU off.
>
> This is different compare just turning off one CPU (i.e. CPU hot-unplug)
> because the CPU will end up to spin for a very long time. And this is
> why I wasn't OK with conditionally avoiding the panic.
>
> Cheers,
>
> --
> Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.