[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling




On 28/03/2019 13:37, Juergen Gross wrote:
> On 28/03/2019 14:33, Julien Grall wrote:
>> Hi,
>>
>> On 3/28/19 1:01 PM, Volodymyr Babchuk wrote:
>>> Hello Juergen,
>>>
>>> On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote:
>>>>
>>>> Especially in the scheduler area (schedule.c, cpupool.c) there is a
>>>> rather complex handling involved when doing suspend and resume.
>>>>
>>>> This can be simplified a lot by not performing a complete cpu down and
>>>> up cycle for the non-boot cpus, but keeping the pure software related
>>>> state and freeing it only in case a cpu didn't come up again during
>>>> resume.
>>>>
>>>> In summary not only the complexity can be reduced, but the failure
>>>> tolerance will be even better with this series: With a dedicated hook
>>>> for failing cpus when resuming it is now possible to survive e.g. a
>>>> cpupool being left without any cpu after resume by moving its domains
>>>> to cpupool0.
>>>>
>>>> Juergen Gross (6):
>>>>     xen/sched: call cpu_disable_scheduler() via cpu notifier
>>>>     xen: add helper for calling notifier_call_chain() to common/cpu.c
>>>>     xen: add new cpu notifier action CPU_RESUME_FAILED
>>>>     xen: don't free percpu areas during suspend
>>>>     xen/cpupool: simplify suspend/resume handling
>>>>     xen/sched: don't disable scheduler on cpus during suspend
>>>>
>>>>    xen/arch/arm/smpboot.c     |   4 -
>>>>    xen/arch/x86/percpu.c      |   3 +-
>>>>    xen/arch/x86/smpboot.c     |   3 -
>>>>    xen/common/cpu.c           |  61 +++++++-------
>>>>    xen/common/cpupool.c       | 131 ++++++++++++-----------------
>>>>    xen/common/schedule.c      | 203
>>>> +++++++++++++++++++--------------------------
>>>>    xen/include/xen/cpu.h      |  29 ++++---
>>>>    xen/include/xen/sched-if.h |   1 -
>>>>    8 files changed, 190 insertions(+), 245 deletions(-)
>>>>
>>>
>>> I tested your patch series on ARM64 platform. We had issue with hard
>>> affinity - there was assertion failure in sched_credit2 code during
>>> suspension if one of the vCPUs is pinned to non-0 pCPU.
>> When you report an error, please make clear what commit you are using
>> and whether you have patches applied on top.
>>
>> In this case, we have no support of suspend/resume on Arm today. So bug
>> report around suspend/resume is a bit confusing to have. It is also more
>> difficult to help when you don't have the full picture as a bug may be
>> in your code and upstream Xen.
>>
>> I saw Juergen suggested a fix, please carry it in whatever series you have.
>>
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) PSCI cpu off failed for CPU0 err=-3
>>> (XEN) ****************************************
>>
>> PSCI CPU off failing is never a good news. Here, the command has been
>> denied by PSCI monitor. But... why does CPU off is actually called on
>> CPU0? Shouldn't we have turned off the platform instead?
> 
> Could it be that a scheduler lock is no longer reachable as the percpu
> memory of another cpu has been released and allocated again? That would
> be one of the possible results of my series.

The data abort shown before the panic is potentially the percpu issue. 
But I don't think it will have the effect to try to turn off CPU0. This 
looks more an issue in the machine_halt/machine_restart path.

Indeed CPU off may rightfully return -3 (DENIED) if the Trusted-OS 
reside on this CPU. We technically should have checked before that the 
CPU could be turned off. But it looks like we are missing this code. I 
vaguely remember to already have pointed out that issue in the past.

Cheers,

> 
> 
> Juergen
> 

-- 
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.