[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen: Don't call panic if ARM TF cpu off returns DENIED



пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xxxxxxx>:
Hi Julien,
>
> Hi,
>
> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
> > Julien Grall <julien@xxxxxxx> writes:
> >
> >> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> >>> Hi Julien,
> >>
> >> Hi Volodymyr,
> >>
> >>> Julien Grall <julien@xxxxxxx> writes:
> >>>
> >>>> Hi,
> >>>>
> >>>> On 16/06/2022 14:55, dmitry.semenets@xxxxxxxxx wrote:
> >>>>> From: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
> >>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
> >>>>
> >>>> I am confused. The spec is talking about Trusted OS and not
> >>>> firmware. The docummentation is also not specific to ARM Trusted
> >>>> Firmware. So did you mean "Trusted OS"?
> >>> It should be "firmware", I believe.
> >>
> >> Hmmm... I couldn't find a reference in the spec suggesting that
> >> CPU_OFF could return DENIED because of the firmware. Do you have a
> >> pointer to the spec?
> >
> > Ah, looks like we are talking about different things. Indeed, CPU_OFF
> > can return DENIED only because of Trusted OS. But entity that *returns*
> > the error to a caller is a firmware.
>
> Right, the interesting part is *why* DENIED is returned not *who*
> returns it.
ARM TF returns DENIED *only* for the platform I have.
We have a dissonance between spec and xen implementation because
DENIED returned by
ARM TF or Thrusted OS or whatever is not a reason for panic. And we
have issues with this.
If machine_restart() behaviour is more or less correct  (sometimes
reports about panic but restarts the machine)
but machine_halt() doesn't work at all.
Transit execution to CPU0 for my understanding is a workaround and
this approach will fix
machine_restart() function but will not fix machine_halt(). Approach
you suggested (spinning all cpus) will work but
will save less energy.
> >>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
> >>>>> section 5.5.2
> >>>>
> >>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
> >>>> the trusted OS can only run on one core.
> >>>>
> >>>> Some of the trusted OS are migratable. So I think we should first
> >>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent
> >>>> the CPU to go offline.
> >>>>
> >>>> That said, upstream doesn't support cpu offlining (I don't know for
> >>>> your use case). In case of shutdown, it is not necessary to offline
> >>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but
> >>>> one. Something like:
> >>>>
> >>> This is even better approach yes. But you mentioned CPU_OFF. Did you
> >>> mean SYSTEM_RESET?
> >>
> >> By CPU_OFF I was referring to the fact that Xen will issue the call
> >> all CPUs but one. The remaining CPU will issue the command to
> >> reset/shutdown the system.
> >>
> >
> > I just want to clarify: change that you suggested removes call to
> > stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all.
>
> I was describing the existing behavior.
>
> >
> > All CPUs except one will spin in
> >
> >      while ( 1 )
> >          wfi();
> >
> > while last cpu will issue SYSTEM_OFF or SYSTEM_RESET.
> >
> > Is this correct?
>
> Yes.
>
> >
> >>>>    void machine_halt(void)
> >>>> @@ -21,10 +23,6 @@ void machine_halt(void)
> >>>>        smp_call_function(halt_this_cpu, NULL, 0);
> >>>>        local_irq_disable();
> >>>>
> >>>> -    /* Wait at most another 10ms for all other CPUs to go offline. */
> >>>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
> >>>> -        mdelay(1);
> >>>> -
> >>>>        /* This is mainly for PSCI-0.2, which does not return if success. 
> >>>> */
> >>>>        call_psci_system_off();
> >>>>
> >>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@xxxxxxxx>
> >>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
> >>>>
> >>>> I don't recall to see patch on the ML recently for this. So is this an
> >>>> internal review?
> >>> Yeah, sorry about that. Dmytro is a new member in our team and he is
> >>> not
> >>> yet familiar with differences in internal reviews and reviews in ML.
> >>
> >> No worries. I usually classify internal review anything that was done
> >> privately. This looks to be a public review, althought not on
> >> xen-devel.
> >>
> >> I understand that not all some of the patches are still in PoC stage
> >> and doing the review on your github is a good idea. But for those are
> >> meant to be for upstream (e.g. bug fixes, small patches), I would
> >> suggest to do the review on xen-devel directly.
> >
> > It not always clear if patch is eligible for upstream. At first we
> > thought that problem is platform-specific and we weren't sure that we
> > will find a proper upstreamable fix.
>
> You can guess but not be sure until you send it to upstrema :). In fact,...
>
> > Probably you saw that PR's name
> > quite differs from final patch. This is because initial solution was
> > completely different from final one.
>
> ... even before looking at your PR, this was the first solution I had in
> mind. I am still pondering whether this could be the best approach
> because I have the suspicion there might be some platform out relying on
> receiving the shutdown request on CPU0.
>
> Anyway, this is so far just theorical, my proposal should solve your
> problem.
>
> On a separate topic, the community is aiming to support a wide range of
> platforms out-of-the-box. I think platform-specific patches are
> acceptable so long they are self-contained (to some extend. I.e if you
> ask to support Xen on RPI3 then I would still probably argue against :))
> or have a limited impact to the rest of the users (this is why we have
> alternative in Xen).
>
> My point here is your initial solution may have been the preferred
> approach for upstream. So if you involve the community early, you are
> reducing the risk to have to backtrack and/or spend extra time in the
> wrong directions.
>
> Cheers,
>
> --
> Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.