[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen: Don't call panic if ARM TF cpu off returns DENIED



Hi,

On 18/06/2022 18:43, Dmytro Semenets wrote:
пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xxxxxxx>:
Hi Julien,

Hi,

On 17/06/2022 10:10, Volodymyr Babchuk wrote:
Julien Grall <julien@xxxxxxx> writes:

On 16/06/2022 19:40, Volodymyr Babchuk wrote:
Hi Julien,

Hi Volodymyr,

Julien Grall <julien@xxxxxxx> writes:

Hi,

On 16/06/2022 14:55, dmitry.semenets@xxxxxxxxx wrote:
From: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
According to PSCI specification ARM TF can return DENIED on CPU OFF.

I am confused. The spec is talking about Trusted OS and not
firmware. The docummentation is also not specific to ARM Trusted
Firmware. So did you mean "Trusted OS"?
It should be "firmware", I believe.

Hmmm... I couldn't find a reference in the spec suggesting that
CPU_OFF could return DENIED because of the firmware. Do you have a
pointer to the spec?

Ah, looks like we are talking about different things. Indeed, CPU_OFF
can return DENIED only because of Trusted OS. But entity that *returns*
the error to a caller is a firmware.

Right, the interesting part is *why* DENIED is returned not *who*
returns it.
ARM TF returns DENIED *only* for the platform I have.
We have a dissonance between spec and xen implementation because
DENIED returned by
ARM TF or Thrusted OS or whatever is not a reason for panic.

I agree that's not a reason for panic. However, knowing the reason does help to figure out the correct approach.

For instance, one could have suggest to migrate the trusted OS to another pCPU. But this would not have worked for you because the DENIED is not about that.

And we
have issues with this.
If machine_restart() behaviour is more or less correct  (sometimes
reports about panic but restarts the machine)

Right...

but machine_halt() doesn't work at al
... this should also be the case here because machine_halt() could also be called from cpu0. So I am a bit confused why you are saying it never works.

Transit execution to CPU0 for my understanding is a workaround and
this approach will fix
machine_restart() function but will not fix machine_halt().

I would say it is a more specific case of what the spec suggests (see below). But it should fix both machine_restart() and machine_halt() because the last CPU running will be CPU0. So Xen would call SYSTEM_* rather than CPU_OF. So I don't understand why you think it will fix one but not the other.

In fact, the idea to always run the request from a given CPU is quite similar to what the specification suggests (5.10.3 DEN0022D.b):

"
One way in which cores can be placed into a known state is to use calls to CPU_OFF on all online cores except for the last one, which instead uses SYSTEM_OFF. If a UP Trusted OS is present, this method only works if the core that calls SYSTEM_OFF is the one where the Trusted OS is resident, as calls to
CPU_OFF on this core return a DENIED error. Any core can call SYSTEM_OFF.
"

For Xen, we would need to detect if the trusted OS is UP and where it is running. Then we could always restart/halt from that CPU or CPU0.

Approach
you suggested (spinning all cpus) will work but
will save less energy.

I am not sure to understand what's the concern about the energy here. From my understanding of the specification, SYSTEM_OFF will take care of switching off the power for all the cores. So at worse, the CPUs will spin for a few ms. This would like be more efficient than a call to PSCI CPU off.

This is different compare just turning off one CPU (i.e. CPU hot-unplug) because the CPU will end up to spin for a very long time. And this is why I wasn't OK with conditionally avoiding the panic.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.