|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v16 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests
On Thu, Mar 19, 2026 at 11:37 AM Orzel, Michal <michal.orzel@xxxxxxx> wrote:
>
>
>
> On 12/12/2025 14:18, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > Add support for the PSCI SYSTEM_SUSPEND function in the vPSCI interface,
> > allowing guests to request suspend via the PSCI v1.0+ SYSTEM_SUSPEND call
> > (both 32-bit and 64-bit variants).
> >
> > Implementation details:
> > - Add SYSTEM_SUSPEND function IDs to PSCI definitions
> > - Trap and handle SYSTEM_SUSPEND in vPSCI
> > - Allow only non-hardware domains to invoke SYSTEM_SUSPEND; return
> > PSCI_NOT_SUPPORTED for the hardware domain to avoid halting the system
> > in hwdom_shutdown() via domain_shutdown
> > - Require all secondary VCPUs of the calling domain to be offline before
> > suspend, as mandated by the PSCI specification
> >
> > The arch_domain_resume() function is an architecture-specific hook that is
> > invoked during domain resume to perform any necessary setup or restoration
> > steps required by the platform. arch_domain_resume() stays int to propagate
> > errno-style detail into common logging; preserving the integer keeps the
> > reason visible and leaves room for future arch-specific failures or richer
> > handling.
> >
> > The new vpsci_vcpu_up_prepare() helper is called on the resume path to set
> > up
> > the vCPU context (such as entry point, some system regs and context ID)
> > before
> > resuming a suspended guest. This keeps ARM/vPSCI-specific logic out of
> > common
> > code and avoids intrusive changes to the generic resume flow.
> >
> > Usage:
> >
> > For Linux-based guests, suspend can be initiated with:
> > echo mem > /sys/power/state
> > or via:
> > systemctl suspend
> >
> > Resuming the guest is performed from control domain using:
> > xl resume <domain>
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > Changes in V16:
> > - Refactor error handling in domain_resume: move logging to generic code,
> > use explicit return code checking.
> > - Make context clearing conditional on success in arch_domain_resume.
> > - The 'int' return type is retained for arch_domain_resume for consistency
> > with other arch hooks and to allow for specific negative error codes.
> > ---
> > xen/arch/arm/domain.c | 39 +++++++++
> > xen/arch/arm/include/asm/domain.h | 2 +
> > xen/arch/arm/include/asm/perfc_defn.h | 1 +
> > xen/arch/arm/include/asm/psci.h | 2 +
> > xen/arch/arm/include/asm/suspend.h | 27 ++++++
> > xen/arch/arm/include/asm/vpsci.h | 5 +-
> > xen/arch/arm/vpsci.c | 116 +++++++++++++++++++++-----
> > xen/common/domain.c | 10 +++
> > xen/include/xen/suspend.h | 25 ++++++
> > 9 files changed, 205 insertions(+), 22 deletions(-)
> > create mode 100644 xen/arch/arm/include/asm/suspend.h
> > create mode 100644 xen/include/xen/suspend.h
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 47973f99d9..f903e7d4f0 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -12,6 +12,8 @@
> > #include <xen/softirq.h>
> > #include <xen/wait.h>
> >
> > +#include <public/sched.h>
> > +
> > #include <asm/arm64/sve.h>
> > #include <asm/cpuerrata.h>
> > #include <asm/cpufeature.h>
> > @@ -24,10 +26,12 @@
> > #include <asm/platform.h>
> > #include <asm/procinfo.h>
> > #include <asm/regs.h>
> > +#include <asm/suspend.h>
> > #include <asm/firmware/sci.h>
> > #include <asm/tee/tee.h>
> > #include <asm/vfp.h>
> > #include <asm/vgic.h>
> > +#include <asm/vpsci.h>
> > #include <asm/vtimer.h>
> >
> > #include "vpci.h"
> > @@ -851,6 +855,41 @@ void arch_domain_creation_finished(struct domain *d)
> > p2m_domain_creation_finished(d);
> > }
> >
> > +int arch_domain_resume(struct domain *d)
> > +{
> > + int rc;
> > + struct resume_info *ctx = &d->arch.resume_ctx;
> > +
> > + if ( !d->is_shutting_down || d->shutdown_code != SHUTDOWN_suspend )
> > + {
> > + dprintk(XENLOG_WARNING,
> > + "%pd: Invalid domain state for resume:
> > is_shutting_down=%u, shutdown_code=%u\n",
> > + d, d->is_shutting_down, d->shutdown_code);
> > + return -EINVAL;
> > + }
> > +
> > + /*
> > + * It is still possible to call domain_shutdown() with a suspend reason
> > + * via some hypercalls, such as SCHEDOP_shutdown or
> > SCHEDOP_remote_shutdown.
> > + * In these cases, the resume context will be empty.
> > + * This is not expected to cause any issues, so we just notify about
> > the
> > + * situation and return without error, allowing the existing logic to
> > + * proceed as expected.
> > + */
> > + if ( !ctx->wake_cpu )
> > + {
> > + dprintk(XENLOG_INFO, "%pd: Wake CPU pointer context was not
> > provided\n",
> > + d);
> > + return 0;
> > + }
> > +
> > + rc = vpsci_vcpu_up_prepare(ctx->wake_cpu , ctx->ep, ctx->cid);
> > + if ( !rc )
> > + memset(ctx, 0, sizeof(*ctx));
> > +
> > + return rc;
> > +}
> > +
> > static int is_guest_pv32_psr(uint32_t psr)
> > {
> > switch (psr & PSR_MODE_MASK)
> > diff --git a/xen/arch/arm/include/asm/domain.h
> > b/xen/arch/arm/include/asm/domain.h
> > index 758ad807e4..66b1246892 100644
> > --- a/xen/arch/arm/include/asm/domain.h
> > +++ b/xen/arch/arm/include/asm/domain.h
> > @@ -5,6 +5,7 @@
> > #include <xen/timer.h>
> > #include <asm/page.h>
> > #include <asm/p2m.h>
> > +#include <asm/suspend.h>
> > #include <asm/vfp.h>
> > #include <asm/mmio.h>
> > #include <asm/gic.h>
> > @@ -126,6 +127,7 @@ struct arch_domain
> > void *sci_data;
> > #endif
> >
> > + struct resume_info resume_ctx;
> > } __cacheline_aligned;
> >
> > struct arch_vcpu
> > diff --git a/xen/arch/arm/include/asm/perfc_defn.h
> > b/xen/arch/arm/include/asm/perfc_defn.h
> > index effd25b69e..8dfcac7e3b 100644
> > --- a/xen/arch/arm/include/asm/perfc_defn.h
> > +++ b/xen/arch/arm/include/asm/perfc_defn.h
> > @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset, "vpsci:
> > system_reset")
> > PERFCOUNTER(vpsci_cpu_suspend, "vpsci: cpu_suspend")
> > PERFCOUNTER(vpsci_cpu_affinity_info, "vpsci: cpu_affinity_info")
> > PERFCOUNTER(vpsci_features, "vpsci: features")
> > +PERFCOUNTER(vpsci_system_suspend, "vpsci: system_suspend")
> >
> > PERFCOUNTER(vcpu_kick, "vcpu: notify other vcpu")
> >
> > diff --git a/xen/arch/arm/include/asm/psci.h
> > b/xen/arch/arm/include/asm/psci.h
> > index 4780972621..48a93e6b79 100644
> > --- a/xen/arch/arm/include/asm/psci.h
> > +++ b/xen/arch/arm/include/asm/psci.h
> > @@ -47,10 +47,12 @@ void call_psci_system_reset(void);
> > #define PSCI_0_2_FN32_SYSTEM_OFF PSCI_0_2_FN32(8)
> > #define PSCI_0_2_FN32_SYSTEM_RESET PSCI_0_2_FN32(9)
> > #define PSCI_1_0_FN32_PSCI_FEATURES PSCI_0_2_FN32(10)
> > +#define PSCI_1_0_FN32_SYSTEM_SUSPEND PSCI_0_2_FN32(14)
> >
> > #define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1)
> > #define PSCI_0_2_FN64_CPU_ON PSCI_0_2_FN64(3)
> > #define PSCI_0_2_FN64_AFFINITY_INFO PSCI_0_2_FN64(4)
> > +#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
> >
> > /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
> > #define PSCI_0_2_AFFINITY_LEVEL_ON 0
> > diff --git a/xen/arch/arm/include/asm/suspend.h
> > b/xen/arch/arm/include/asm/suspend.h
> > new file mode 100644
> > index 0000000000..313d03ea59
> > --- /dev/null
> > +++ b/xen/arch/arm/include/asm/suspend.h
> > @@ -0,0 +1,27 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#ifndef ARM_SUSPEND_H
> > +#define ARM_SUSPEND_H
> > +
> > +struct domain;
> > +struct vcpu;
> > +
> > +struct resume_info {
> > + register_t ep;
> > + register_t cid;
> > + struct vcpu *wake_cpu;
> > +};
> > +
> > +int arch_domain_resume(struct domain *d);
> > +
> > +#endif /* ARM_SUSPEND_H */
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/arch/arm/include/asm/vpsci.h
> > b/xen/arch/arm/include/asm/vpsci.h
> > index 0cca5e6830..d790ab3715 100644
> > --- a/xen/arch/arm/include/asm/vpsci.h
> > +++ b/xen/arch/arm/include/asm/vpsci.h
> > @@ -23,12 +23,15 @@
> > #include <asm/psci.h>
> >
> > /* Number of function implemented by virtual PSCI (only 0.2 or later) */
> > -#define VPSCI_NR_FUNCS 12
> > +#define VPSCI_NR_FUNCS 14
> >
> > /* Functions handle PSCI calls from the guests */
> > bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
> > bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid);
> >
> > +int vpsci_vcpu_up_prepare(struct vcpu *v, register_t entry_point,
> > + register_t context_id);
> > +
> > #endif /* __ASM_VPSCI_H__ */
> >
> > /*
> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> > index 7ba9ccd94b..c4d616ec68 100644
> > --- a/xen/arch/arm/vpsci.c
> > +++ b/xen/arch/arm/vpsci.c
> > @@ -10,32 +10,16 @@
> >
> > #include <public/sched.h>
> >
> > -static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > - register_t context_id)
> > +int vpsci_vcpu_up_prepare(struct vcpu *v, register_t entry_point,
> > + register_t context_id)
> > {
> > - struct vcpu *v;
> > - struct domain *d = current->domain;
> > - struct vcpu_guest_context *ctxt;
> > int rc;
> > + struct domain *d = v->domain;
> > bool is_thumb = entry_point & 1;
> > - register_t vcpuid;
> > -
> > - vcpuid = vaffinity_to_vcpuid(target_cpu);
> > -
> > - if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> > - return PSCI_INVALID_PARAMETERS;
> > -
> > - /* THUMB set is not allowed with 64-bit domain */
> > - if ( is_64bit_domain(d) && is_thumb )
> > - return PSCI_INVALID_ADDRESS;
> > -
> > - if ( !test_bit(_VPF_down, &v->pause_flags) )
> > - return PSCI_ALREADY_ON;
> > + struct vcpu_guest_context *ctxt;
> >
> > if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
> > - return PSCI_DENIED;
> > -
> > - vgic_clear_pending_irqs(v);
> > + return -ENOMEM;
> >
> > memset(ctxt, 0, sizeof(*ctxt));
> > ctxt->user_regs.pc64 = (u64) entry_point;
> > @@ -76,8 +60,37 @@ static int do_common_cpu_on(register_t target_cpu,
> > register_t entry_point,
> > free_vcpu_guest_context(ctxt);
> >
> > if ( rc < 0 )
> > + return rc;
> > +
> > + return 0;
> > +}
> > +
> > +static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > + register_t context_id)
> > +{
> > + struct vcpu *v;
> > + struct domain *d = current->domain;
> > + int rc;
> > + bool is_thumb = entry_point & 1;
> > + register_t vcpuid;
> > +
> > + vcpuid = vaffinity_to_vcpuid(target_cpu);
> > +
> > + if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> > + return PSCI_INVALID_PARAMETERS;
> > +
> > + /* THUMB set is not allowed with 64-bit domain */
> > + if ( is_64bit_domain(d) && is_thumb )
> > + return PSCI_INVALID_ADDRESS;
> > +
> > + if ( !test_bit(_VPF_down, &v->pause_flags) )
> > + return PSCI_ALREADY_ON;
> > +
> > + rc = vpsci_vcpu_up_prepare(v, entry_point, context_id);
> > + if ( rc )
> > return PSCI_DENIED;
> >
> > + vgic_clear_pending_irqs(v);
> > vcpu_wake(v);
> >
> > return PSCI_SUCCESS;
> > @@ -197,6 +210,48 @@ static void do_psci_0_2_system_reset(void)
> > domain_shutdown(d,SHUTDOWN_reboot);
> > }
> >
> > +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t
> > cid)
> > +{
> > + int32_t rc;
> > + struct vcpu *v;
> > + struct domain *d = current->domain;
> > + bool is_thumb = epoint & 1;
> > +
> > + /* THUMB set is not allowed with 64-bit domain */
> > + if ( is_64bit_domain(d) && is_thumb )
> > + return PSCI_INVALID_ADDRESS;
> > +
> > + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> > + if ( is_hardware_domain(d) )
> > + return PSCI_NOT_SUPPORTED;
> > +
> > + /* Ensure that all CPUs other than the calling one are offline */
> > + domain_lock(d);
> > + for_each_vcpu ( d, v )
> > + {
> > + if ( v != current && is_vcpu_online(v) )
> > + {
> > + domain_unlock(d);
> > + return PSCI_DENIED;
> > + }
> > + }
> > + domain_unlock(d);
> > +
> > + rc = domain_shutdown(d, SHUTDOWN_suspend);
> There is a theoretical race between this and ...
> > + if ( rc )
> > + return PSCI_DENIED;
> > +
> > + d->arch.resume_ctx.ep = epoint;
> > + d->arch.resume_ctx.cid = cid;
> > + d->arch.resume_ctx.wake_cpu = current;
> setting up resume context. It is practically impossible but still we could
> move
> setting up the context before domain_shutdown and clearing it on failure to be
> race free. The race is about small window between domain_shutdown releasing
> the
> shutdown_lock and other CPU calling domain_resume, taking the lock and running
> arch_domain_resume before resume_ctx has been populated.
I think the current flow is already serialized, so I don't believe
arch_domain_resume() can observe an uninitialized resume_ctx here.
A concurrent domain_resume() does not take shutdown_lock first. It starts
with domain_pause(d), and that path is synchronous: it walks all domain
vCPUs and uses vcpu_sleep_sync().
In the SYSTEM_SUSPEND path, the calling vCPU is still running after
domain_shutdown() returns, and it populates resume_ctx before leaving the
hypercall path. So if another CPU tries to resume the domain in that
window, it should block in domain_pause() until that vCPU stops running.
Only after that can it take shutdown_lock and proceed to
arch_domain_resume().
So, as far as I can see, arch_domain_resume() should not be able to
observe an uninitialized resume_ctx in the current flow.
Also, moving resume_ctx setup before domain_shutdown() would slightly
complicate the failure path, since we would then need to explicitly
clear/invalidate the pre-populated context if domain_shutdown() fails.
So unless we think this race is actually reachable, I would prefer to
keep the current ordering. That said, if you strongly prefer making the
ordering more explicit, I can rework it that way.
Best regards,
Mykola
>
> ~Michal
>
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |