[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot



Hi Julien,

On Fri, May 11, 2018 at 12:54 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
>
>
> On 11/05/18 11:41, Mirela Simonovic wrote:
>>
>> Hi Dario,
>>
>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@xxxxxxxx>
>> wrote:
>>>
>>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>>>>
>>>> Regardless of the fact that the notifier returns an error or not, I
>>>> believe it would be good and safe to set priority and document that
>>>> priority zero would cause racing issue in the scenario I debugged
>>>> today. I'm pretty sure that this discussion would be forgotten soon
>>>> and it really should be documented in code/comment.
>>>>
>>> I may very well be missing or misunderstanding something, but I
>>> continue to think that the problem here is that CPU_STARTING can't,
>>> right now, fail, while you need it to be able to.
>>>
>>> If that is the case, giving different priorities to the notifier, is
>>> not a solution.
>>>
>>
>> Let me try to clarify. The assumption is that the starting CPU can
>> fail. Additional requirement set by Julien is that panic or BUG_ON is
>> not acceptable.
>
>
> Please don't twist my word. I never said it was not acceptable to have the
> BUG_ON in notify_cpu_starting().

I didn't say that either. You referenced notify_cpu_starting() here, not me.
BTW, if you get the impression that I'm twisting words then we have a
misunderstanding and your approach is not the best way toward
clarifying it. Please have that in mind next time, because I do not
have such an intent and I never will.

I referred to panic/BUG_ON in this scenario regardless of the place
from which it would be invoked. My understanding from your previous
answers is that you think the system should not panic/BUG_ON in this
scenario, because this kind of error the system should be able to
survive.

>
> I am going to repeat the content of my answer to your last e-mail:
>
> I was aware about it since the beginning. The whole point of the
> conversation was we should avoid to take the decision at the lower level and
> let the upper layer decide what to do.
>
> If the system is failing today then that's fine and still fit what I said in
> my first e-mail of that thread. For reminder:
>
> "We should really avoid to use panic(...) if this is something the system
> can survive. In that specific case, it would only affect the current CPU. So
> it would be better to return an error and let the caller decide what to do."
>
> To summarize:
>         1) Notifiers should only report an error and let the upper caller
> (here notify_cpu_starting()) deciding what to do.
>         2) I am OK with the BUG_ON in notify_cpu_starting() for now.

I agree with BUG_ON in notify_cpu_starting().

>

Let me just clarify consequence of your proposal according to my
understanding. If instead of stopping the CPU when enabling a
capability fails the notifier returns an error, the error will
propagate to notify_cpu_starting() and BUG_ON will crash the system.
The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
of panic that is submitted in this patch would stop only the erroneous
CPU. The rest of the system will continue to work and I though that is
what's the goal since we don't want to panic/BUG_ON.
From that perspective I believe stop_cpu() in
enable_nonboot_cpu_caps() is better compared to returning an error
from the notifier.

You said above "I would be ok having stop_cpu called here" and I did
so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
submitted in this patch).

If you believe my understanding is not correct, if I missed something
or you have another proposal please let me know.

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.