[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling



Hi,

On 3/28/19 1:01 PM, Volodymyr Babchuk wrote:
Hello Juergen,

On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote:

Especially in the scheduler area (schedule.c, cpupool.c) there is a
rather complex handling involved when doing suspend and resume.

This can be simplified a lot by not performing a complete cpu down and
up cycle for the non-boot cpus, but keeping the pure software related
state and freeing it only in case a cpu didn't come up again during
resume.

In summary not only the complexity can be reduced, but the failure
tolerance will be even better with this series: With a dedicated hook
for failing cpus when resuming it is now possible to survive e.g. a
cpupool being left without any cpu after resume by moving its domains
to cpupool0.

Juergen Gross (6):
   xen/sched: call cpu_disable_scheduler() via cpu notifier
   xen: add helper for calling notifier_call_chain() to common/cpu.c
   xen: add new cpu notifier action CPU_RESUME_FAILED
   xen: don't free percpu areas during suspend
   xen/cpupool: simplify suspend/resume handling
   xen/sched: don't disable scheduler on cpus during suspend

  xen/arch/arm/smpboot.c     |   4 -
  xen/arch/x86/percpu.c      |   3 +-
  xen/arch/x86/smpboot.c     |   3 -
  xen/common/cpu.c           |  61 +++++++-------
  xen/common/cpupool.c       | 131 ++++++++++++-----------------
  xen/common/schedule.c      | 203 +++++++++++++++++++--------------------------
  xen/include/xen/cpu.h      |  29 ++++---
  xen/include/xen/sched-if.h |   1 -
  8 files changed, 190 insertions(+), 245 deletions(-)


I tested your patch series on ARM64 platform. We had issue with hard
affinity - there was assertion failure in sched_credit2 code during
suspension if one of the vCPUs is pinned to non-0 pCPU.
When you report an error, please make clear what commit you are using and whether you have patches applied on top.

In this case, we have no support of suspend/resume on Arm today. So bug report around suspend/resume is a bit confusing to have. It is also more difficult to help when you don't have the full picture as a bug may be in your code and upstream Xen.

I saw Juergen suggested a fix, please carry it in whatever series you have.

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) PSCI cpu off failed for CPU0 err=-3
(XEN) ****************************************

PSCI CPU off failing is never a good news. Here, the command has been denied by PSCI monitor. But... why does CPU off is actually called on CPU0? Shouldn't we have turned off the platform instead?

(XEN)
(XEN) Reboot in five seconds...

Are the logs below actually a mistaken paste?

(XEN) CPU2 will call ARM_SMCCC_ARCH_WORKAROUND_1 on exception entry
(XEN) CPU 2 booted.
(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x0 on CPU2 via TTBR 0x00000000781a8000
(XEN) 0TH[0x0] = 0x00000000781b0f7f
(XEN) 1ST[0x0] = 0x00000000781aaf7f
(XEN) 2ND[0x0] = 0x0000000000000000
(XEN) CPU2: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.13-unstable  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000233660 _spin_lock+0x1c/0x88
(XEN) LR:     000000000023365c
(XEN) SP:     000080037ff77d50
(XEN) CPSR:   a00002c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000006  X1: 00000000fffffffe  X2: 0000000000000000
(XEN)      X3: 0000000000000002  X4: 000080037fc42480  X5: 0000000000000000
(XEN)      X6: 0000000000000080  X7: 000080037ffb0000  X8: 00000000002a1000
(XEN)      X9: 000000000000000a X10: 000080037ff77bf8 X11: 0000000000000032
(XEN)     X12: 0000000000000001 X13: 000000000027fff0 X14: 0000000000000020
(XEN)     X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
(XEN)     X18: 0000000000000000 X19: 0000000000000000 X20: 0000000000000000
(XEN)     X21: 000080037ff7e108 X22: 0000000000000002 X23: 000000000033bc88
(XEN)     X24: 0000000000336020 X25: 0000000000000000 X26: 0000000000000002
(XEN)     X27: 0000000000336000 X28: 0000000000000000  FP: 000080037ff77d50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 00000000781a8000
(XEN)
(XEN)    ESR_EL2: 96000006
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000000000
(XEN)
(XEN) Xen stack trace from sp=000080037ff77d50:
(XEN)    000080037ff77d70 00000000002336e8 000080037ff7d000 000000000023e00c
(XEN)    000080037ff77d80 000000000022e90c 000080037ff77e10 0000000000232af8
(XEN)    0000000000000002 00000000002fbb00 ffffffffffffffff 000000000033cf20
(XEN)    00000000002a0680 0000000000000001 0000000000000001 0000000000000001
(XEN)    0000000000000000 000080037ff77e90 000080037ff77e50 00000000ffffffc8
(XEN)    000000000029f008 00000000002ffc41 000080037ff77e90 0000000000263c68
(XEN)    000080037ff77e50 0000000000232b6c 0000000000000002 0000000000000004
(XEN)    0000000000000002 00000000002fbc00 0000000000336448 00000000002fbb00
(XEN)    000080037ff77e60 0000000000257230 000080037ff77e90 0000000000263c6c
(XEN)    0000000000000002 0000000077e80000 0000000000000000 0000000000000001
(XEN)    0000000000000000 0000000000000002 0000000000000001 effffffffffaffff
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000233660>] _spin_lock+0x1c/0x88 (PC)
(XEN)    [<000000000023365c>] _spin_lock+0x18/0x88 (LR)
(XEN)    [<00000000002336e8>] _spin_lock_irq+0x1c/0x24
(XEN)    [<000000000022e90c>] schedule.c#schedule+0xe8/0x74c
(XEN)    [<0000000000232af8>] softirq.c#__do_softirq+0xcc/0xe4
(XEN)    [<0000000000232b6c>] do_softirq+0x14/0x1c
(XEN)    [<0000000000257230>] idle_loop+0x174/0x188
(XEN)    [<0000000000263c6c>] start_secondary+0x1f4/0x200
(XEN)    [<0000000000000002>] 0000000000000002
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) CPU2: Unexpected Trap: Data Abort
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.