[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/7] xen/arm: When CPU dies, free percpu area immediatelly



Hi Julien, Stefano,

Thanks for the feedback and suggestions.

On Mon, Apr 16, 2018 at 3:14 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi,
>
>
> On 12/04/18 22:31, Stefano Stabellini wrote:
>>
>> On Thu, 12 Apr 2018, Julien Grall wrote:
>>>
>>> On 12/04/18 00:46, Stefano Stabellini wrote:
>>>>
>>>> On Wed, 11 Apr 2018, Julien Grall wrote:
>>>>>
>>>>> On 11/04/18 14:19, Mirela Simonovic wrote:
>>>>>>
>>>>>> Freeing percpu area is done when a non-boot CPU is disabled upon
>>>>>> suspend.
>>>>>> This use to be scheduled for execution after a period of time, what
>>>>>> caused
>>>>>> the following racing issues. If CPU is enabled after it is disabled
>>>>>> and
>>>>>> before the freeing of percpu area is performed, Xen would crash upon
>>>>>> initializing percpu area because per cpu offset is not marked as
>>>>>> INVALID_PERCPU_AREA (this suppose to happen when cpu area is freed).
>>>>>> To resolve the racing issue, free percpu area right away instead
>>>>>> scheduling it for later.
>>>>>
>>>>>
>>>>> The reason of using the RCU is you want to make sure that none of the
>>>>> other
>>>>> CPUs will access that percpu data before freeing it. So I don't think
>>>>> this
>>>>> patch is valid.
>>>>>
>>>>> It looks like to me a rcu barrier is missing after calling cpu_down
>>>>> somewhere
>>>>> in the CPU off path. I am not entirely sure where.
>>>>
>>>>
>>>> We need a rcu_barrier(). Perhaps, it could be added on cpu_on before
>>>> initializing the percpu area?
>>>
>>>
>>> Do you mind giving a bit more details on your thought? cpu_up looks a
>>> strange
>>> place as no one should access the percpu area after the CPU is down. So
>>> it
>>> feels the rcu_barrier should be somewhere before PSCI_cpu_off is called.
>>
>>
>> Yes, it feels strange to do it on cpu_on, it would be more obvious on
>> cpu_off, but we don't actually need to _free_percpu_area on cpu_off,
>> right? We only need to make sure it is done before cpu_percpu_callback
>> is called on cpu_on.
>>
>> My suggestion would be to evaluate if it is possible to introduce the
>> rcu_barrier() on the resume path before cpu_percpu_callback, maybe in
>> start_secondary.
>
>
> Well, cpu_percpu_callback is not called by start_secondary. It is called
> when preparing the CPU from another CPU. So anything in start_secondary will
> not work.
>

I have also confirmed that in start_secondary it doesn't work, it's too late.

>>
>> I was also looking at xen/arch/x86/acpi/power.c:enter_state and noticed
>> that they chose to call rcu_barrier() on enable_cpu before
>> enable_nonboot_cpus().
>

Before the enable_nonboot_cpus() gets invoked seems to be a good place
for rcu_barrier(), as it's done for x86.

>
> I guess the rcu_barrier() in the function handling suspend/resume works. But
> that doesn't cover the hotplug case. Looking at x86, suspend/resume case.
> For the hotplug case, there are an rcu_barrier in cpu_{up,down}_helper but
> they are only present in the case of cpu_{up,down} failed. I am not entirely
> sure how this is handled in x86
>
> Andrew, Jan, do you know when the percpu will be free on hotplug? It is call
> to call_rcu(...) but I am not sure when this is going to be executed.
>

AFAIK disable/enable_nonboot_cpus() is the only way to do the hotplug
and rcu_barrier() is not included in the flow.
I suggest to invoke rcu_barrier() before enable_nonboot_cpus() and
eventually this could be moved into enable_nonboot_cpus() itself.

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.