|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v16 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests
On 19/03/2026 11:14, Mykola Kvach wrote:
> On Thu, Mar 19, 2026 at 11:37 AM Orzel, Michal <michal.orzel@xxxxxxx> wrote:
>>
>>
>>
>> On 12/12/2025 14:18, Mykola Kvach wrote:
>>> From: Mykola Kvach <mykola_kvach@xxxxxxxx>
>>>
>>> Add support for the PSCI SYSTEM_SUSPEND function in the vPSCI interface,
>>> allowing guests to request suspend via the PSCI v1.0+ SYSTEM_SUSPEND call
>>> (both 32-bit and 64-bit variants).
>>>
>>> Implementation details:
>>> - Add SYSTEM_SUSPEND function IDs to PSCI definitions
>>> - Trap and handle SYSTEM_SUSPEND in vPSCI
>>> - Allow only non-hardware domains to invoke SYSTEM_SUSPEND; return
>>> PSCI_NOT_SUPPORTED for the hardware domain to avoid halting the system
>>> in hwdom_shutdown() via domain_shutdown
>>> - Require all secondary VCPUs of the calling domain to be offline before
>>> suspend, as mandated by the PSCI specification
>>>
>>> The arch_domain_resume() function is an architecture-specific hook that is
>>> invoked during domain resume to perform any necessary setup or restoration
>>> steps required by the platform. arch_domain_resume() stays int to propagate
>>> errno-style detail into common logging; preserving the integer keeps the
>>> reason visible and leaves room for future arch-specific failures or richer
>>> handling.
>>>
>>> The new vpsci_vcpu_up_prepare() helper is called on the resume path to set
>>> up
>>> the vCPU context (such as entry point, some system regs and context ID)
>>> before
>>> resuming a suspended guest. This keeps ARM/vPSCI-specific logic out of
>>> common
>>> code and avoids intrusive changes to the generic resume flow.
>>>
>>> Usage:
>>>
>>> For Linux-based guests, suspend can be initiated with:
>>> echo mem > /sys/power/state
>>> or via:
>>> systemctl suspend
>>>
>>> Resuming the guest is performed from control domain using:
>>> xl resume <domain>
>>>
>>> Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
>>> ---
>>> Changes in V16:
>>> - Refactor error handling in domain_resume: move logging to generic code,
>>> use explicit return code checking.
>>> - Make context clearing conditional on success in arch_domain_resume.
>>> - The 'int' return type is retained for arch_domain_resume for consistency
>>> with other arch hooks and to allow for specific negative error codes.
>>> ---
>>> xen/arch/arm/domain.c | 39 +++++++++
>>> xen/arch/arm/include/asm/domain.h | 2 +
>>> xen/arch/arm/include/asm/perfc_defn.h | 1 +
>>> xen/arch/arm/include/asm/psci.h | 2 +
>>> xen/arch/arm/include/asm/suspend.h | 27 ++++++
>>> xen/arch/arm/include/asm/vpsci.h | 5 +-
>>> xen/arch/arm/vpsci.c | 116 +++++++++++++++++++++-----
>>> xen/common/domain.c | 10 +++
>>> xen/include/xen/suspend.h | 25 ++++++
>>> 9 files changed, 205 insertions(+), 22 deletions(-)
>>> create mode 100644 xen/arch/arm/include/asm/suspend.h
>>> create mode 100644 xen/include/xen/suspend.h
>>>
>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>> index 47973f99d9..f903e7d4f0 100644
>>> --- a/xen/arch/arm/domain.c
>>> +++ b/xen/arch/arm/domain.c
>>> @@ -12,6 +12,8 @@
>>> #include <xen/softirq.h>
>>> #include <xen/wait.h>
>>>
>>> +#include <public/sched.h>
>>> +
>>> #include <asm/arm64/sve.h>
>>> #include <asm/cpuerrata.h>
>>> #include <asm/cpufeature.h>
>>> @@ -24,10 +26,12 @@
>>> #include <asm/platform.h>
>>> #include <asm/procinfo.h>
>>> #include <asm/regs.h>
>>> +#include <asm/suspend.h>
>>> #include <asm/firmware/sci.h>
>>> #include <asm/tee/tee.h>
>>> #include <asm/vfp.h>
>>> #include <asm/vgic.h>
>>> +#include <asm/vpsci.h>
>>> #include <asm/vtimer.h>
>>>
>>> #include "vpci.h"
>>> @@ -851,6 +855,41 @@ void arch_domain_creation_finished(struct domain *d)
>>> p2m_domain_creation_finished(d);
>>> }
>>>
>>> +int arch_domain_resume(struct domain *d)
>>> +{
>>> + int rc;
>>> + struct resume_info *ctx = &d->arch.resume_ctx;
>>> +
>>> + if ( !d->is_shutting_down || d->shutdown_code != SHUTDOWN_suspend )
>>> + {
>>> + dprintk(XENLOG_WARNING,
>>> + "%pd: Invalid domain state for resume:
>>> is_shutting_down=%u, shutdown_code=%u\n",
>>> + d, d->is_shutting_down, d->shutdown_code);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /*
>>> + * It is still possible to call domain_shutdown() with a suspend reason
>>> + * via some hypercalls, such as SCHEDOP_shutdown or
>>> SCHEDOP_remote_shutdown.
>>> + * In these cases, the resume context will be empty.
>>> + * This is not expected to cause any issues, so we just notify about
>>> the
>>> + * situation and return without error, allowing the existing logic to
>>> + * proceed as expected.
>>> + */
>>> + if ( !ctx->wake_cpu )
>>> + {
>>> + dprintk(XENLOG_INFO, "%pd: Wake CPU pointer context was not
>>> provided\n",
>>> + d);
>>> + return 0;
>>> + }
>>> +
>>> + rc = vpsci_vcpu_up_prepare(ctx->wake_cpu , ctx->ep, ctx->cid);
>>> + if ( !rc )
>>> + memset(ctx, 0, sizeof(*ctx));
>>> +
>>> + return rc;
>>> +}
>>> +
>>> static int is_guest_pv32_psr(uint32_t psr)
>>> {
>>> switch (psr & PSR_MODE_MASK)
>>> diff --git a/xen/arch/arm/include/asm/domain.h
>>> b/xen/arch/arm/include/asm/domain.h
>>> index 758ad807e4..66b1246892 100644
>>> --- a/xen/arch/arm/include/asm/domain.h
>>> +++ b/xen/arch/arm/include/asm/domain.h
>>> @@ -5,6 +5,7 @@
>>> #include <xen/timer.h>
>>> #include <asm/page.h>
>>> #include <asm/p2m.h>
>>> +#include <asm/suspend.h>
>>> #include <asm/vfp.h>
>>> #include <asm/mmio.h>
>>> #include <asm/gic.h>
>>> @@ -126,6 +127,7 @@ struct arch_domain
>>> void *sci_data;
>>> #endif
>>>
>>> + struct resume_info resume_ctx;
>>> } __cacheline_aligned;
>>>
>>> struct arch_vcpu
>>> diff --git a/xen/arch/arm/include/asm/perfc_defn.h
>>> b/xen/arch/arm/include/asm/perfc_defn.h
>>> index effd25b69e..8dfcac7e3b 100644
>>> --- a/xen/arch/arm/include/asm/perfc_defn.h
>>> +++ b/xen/arch/arm/include/asm/perfc_defn.h
>>> @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset, "vpsci:
>>> system_reset")
>>> PERFCOUNTER(vpsci_cpu_suspend, "vpsci: cpu_suspend")
>>> PERFCOUNTER(vpsci_cpu_affinity_info, "vpsci: cpu_affinity_info")
>>> PERFCOUNTER(vpsci_features, "vpsci: features")
>>> +PERFCOUNTER(vpsci_system_suspend, "vpsci: system_suspend")
>>>
>>> PERFCOUNTER(vcpu_kick, "vcpu: notify other vcpu")
>>>
>>> diff --git a/xen/arch/arm/include/asm/psci.h
>>> b/xen/arch/arm/include/asm/psci.h
>>> index 4780972621..48a93e6b79 100644
>>> --- a/xen/arch/arm/include/asm/psci.h
>>> +++ b/xen/arch/arm/include/asm/psci.h
>>> @@ -47,10 +47,12 @@ void call_psci_system_reset(void);
>>> #define PSCI_0_2_FN32_SYSTEM_OFF PSCI_0_2_FN32(8)
>>> #define PSCI_0_2_FN32_SYSTEM_RESET PSCI_0_2_FN32(9)
>>> #define PSCI_1_0_FN32_PSCI_FEATURES PSCI_0_2_FN32(10)
>>> +#define PSCI_1_0_FN32_SYSTEM_SUSPEND PSCI_0_2_FN32(14)
>>>
>>> #define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1)
>>> #define PSCI_0_2_FN64_CPU_ON PSCI_0_2_FN64(3)
>>> #define PSCI_0_2_FN64_AFFINITY_INFO PSCI_0_2_FN64(4)
>>> +#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
>>>
>>> /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
>>> #define PSCI_0_2_AFFINITY_LEVEL_ON 0
>>> diff --git a/xen/arch/arm/include/asm/suspend.h
>>> b/xen/arch/arm/include/asm/suspend.h
>>> new file mode 100644
>>> index 0000000000..313d03ea59
>>> --- /dev/null
>>> +++ b/xen/arch/arm/include/asm/suspend.h
>>> @@ -0,0 +1,27 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +
>>> +#ifndef ARM_SUSPEND_H
>>> +#define ARM_SUSPEND_H
>>> +
>>> +struct domain;
>>> +struct vcpu;
>>> +
>>> +struct resume_info {
>>> + register_t ep;
>>> + register_t cid;
>>> + struct vcpu *wake_cpu;
>>> +};
>>> +
>>> +int arch_domain_resume(struct domain *d);
>>> +
>>> +#endif /* ARM_SUSPEND_H */
>>> +
>>> +/*
>>> + * Local variables:
>>> + * mode: C
>>> + * c-file-style: "BSD"
>>> + * c-basic-offset: 4
>>> + * tab-width: 4
>>> + * indent-tabs-mode: nil
>>> + * End:
>>> + */
>>> diff --git a/xen/arch/arm/include/asm/vpsci.h
>>> b/xen/arch/arm/include/asm/vpsci.h
>>> index 0cca5e6830..d790ab3715 100644
>>> --- a/xen/arch/arm/include/asm/vpsci.h
>>> +++ b/xen/arch/arm/include/asm/vpsci.h
>>> @@ -23,12 +23,15 @@
>>> #include <asm/psci.h>
>>>
>>> /* Number of function implemented by virtual PSCI (only 0.2 or later) */
>>> -#define VPSCI_NR_FUNCS 12
>>> +#define VPSCI_NR_FUNCS 14
>>>
>>> /* Functions handle PSCI calls from the guests */
>>> bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
>>> bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid);
>>>
>>> +int vpsci_vcpu_up_prepare(struct vcpu *v, register_t entry_point,
>>> + register_t context_id);
>>> +
>>> #endif /* __ASM_VPSCI_H__ */
>>>
>>> /*
>>> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
>>> index 7ba9ccd94b..c4d616ec68 100644
>>> --- a/xen/arch/arm/vpsci.c
>>> +++ b/xen/arch/arm/vpsci.c
>>> @@ -10,32 +10,16 @@
>>>
>>> #include <public/sched.h>
>>>
>>> -static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
>>> - register_t context_id)
>>> +int vpsci_vcpu_up_prepare(struct vcpu *v, register_t entry_point,
>>> + register_t context_id)
>>> {
>>> - struct vcpu *v;
>>> - struct domain *d = current->domain;
>>> - struct vcpu_guest_context *ctxt;
>>> int rc;
>>> + struct domain *d = v->domain;
>>> bool is_thumb = entry_point & 1;
>>> - register_t vcpuid;
>>> -
>>> - vcpuid = vaffinity_to_vcpuid(target_cpu);
>>> -
>>> - if ( (v = domain_vcpu(d, vcpuid)) == NULL )
>>> - return PSCI_INVALID_PARAMETERS;
>>> -
>>> - /* THUMB set is not allowed with 64-bit domain */
>>> - if ( is_64bit_domain(d) && is_thumb )
>>> - return PSCI_INVALID_ADDRESS;
>>> -
>>> - if ( !test_bit(_VPF_down, &v->pause_flags) )
>>> - return PSCI_ALREADY_ON;
>>> + struct vcpu_guest_context *ctxt;
>>>
>>> if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
>>> - return PSCI_DENIED;
>>> -
>>> - vgic_clear_pending_irqs(v);
>>> + return -ENOMEM;
>>>
>>> memset(ctxt, 0, sizeof(*ctxt));
>>> ctxt->user_regs.pc64 = (u64) entry_point;
>>> @@ -76,8 +60,37 @@ static int do_common_cpu_on(register_t target_cpu,
>>> register_t entry_point,
>>> free_vcpu_guest_context(ctxt);
>>>
>>> if ( rc < 0 )
>>> + return rc;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
>>> + register_t context_id)
>>> +{
>>> + struct vcpu *v;
>>> + struct domain *d = current->domain;
>>> + int rc;
>>> + bool is_thumb = entry_point & 1;
>>> + register_t vcpuid;
>>> +
>>> + vcpuid = vaffinity_to_vcpuid(target_cpu);
>>> +
>>> + if ( (v = domain_vcpu(d, vcpuid)) == NULL )
>>> + return PSCI_INVALID_PARAMETERS;
>>> +
>>> + /* THUMB set is not allowed with 64-bit domain */
>>> + if ( is_64bit_domain(d) && is_thumb )
>>> + return PSCI_INVALID_ADDRESS;
>>> +
>>> + if ( !test_bit(_VPF_down, &v->pause_flags) )
>>> + return PSCI_ALREADY_ON;
>>> +
>>> + rc = vpsci_vcpu_up_prepare(v, entry_point, context_id);
>>> + if ( rc )
>>> return PSCI_DENIED;
>>>
>>> + vgic_clear_pending_irqs(v);
>>> vcpu_wake(v);
>>>
>>> return PSCI_SUCCESS;
>>> @@ -197,6 +210,48 @@ static void do_psci_0_2_system_reset(void)
>>> domain_shutdown(d,SHUTDOWN_reboot);
>>> }
>>>
>>> +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t
>>> cid)
>>> +{
>>> + int32_t rc;
>>> + struct vcpu *v;
>>> + struct domain *d = current->domain;
>>> + bool is_thumb = epoint & 1;
>>> +
>>> + /* THUMB set is not allowed with 64-bit domain */
>>> + if ( is_64bit_domain(d) && is_thumb )
>>> + return PSCI_INVALID_ADDRESS;
>>> +
>>> + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
>>> + if ( is_hardware_domain(d) )
>>> + return PSCI_NOT_SUPPORTED;
>>> +
>>> + /* Ensure that all CPUs other than the calling one are offline */
>>> + domain_lock(d);
>>> + for_each_vcpu ( d, v )
>>> + {
>>> + if ( v != current && is_vcpu_online(v) )
>>> + {
>>> + domain_unlock(d);
>>> + return PSCI_DENIED;
>>> + }
>>> + }
>>> + domain_unlock(d);
>>> +
>>> + rc = domain_shutdown(d, SHUTDOWN_suspend);
>> There is a theoretical race between this and ...
>>> + if ( rc )
>>> + return PSCI_DENIED;
>>> +
>>> + d->arch.resume_ctx.ep = epoint;
>>> + d->arch.resume_ctx.cid = cid;
>>> + d->arch.resume_ctx.wake_cpu = current;
>> setting up resume context. It is practically impossible but still we could
>> move
>> setting up the context before domain_shutdown and clearing it on failure to
>> be
>> race free. The race is about small window between domain_shutdown releasing
>> the
>> shutdown_lock and other CPU calling domain_resume, taking the lock and
>> running
>> arch_domain_resume before resume_ctx has been populated.
>
> I think the current flow is already serialized, so I don't believe
> arch_domain_resume() can observe an uninitialized resume_ctx here.
>
> A concurrent domain_resume() does not take shutdown_lock first. It starts
> with domain_pause(d), and that path is synchronous: it walks all domain
> vCPUs and uses vcpu_sleep_sync().
>
> In the SYSTEM_SUSPEND path, the calling vCPU is still running after
> domain_shutdown() returns, and it populates resume_ctx before leaving the
> hypercall path. So if another CPU tries to resume the domain in that
> window, it should block in domain_pause() until that vCPU stops running.
> Only after that can it take shutdown_lock and proceed to
> arch_domain_resume().
>
> So, as far as I can see, arch_domain_resume() should not be able to
> observe an uninitialized resume_ctx in the current flow.
>
> Also, moving resume_ctx setup before domain_shutdown() would slightly
> complicate the failure path, since we would then need to explicitly
> clear/invalidate the pre-populated context if domain_shutdown() fails.
>
> So unless we think this race is actually reachable, I would prefer to
> keep the current ordering. That said, if you strongly prefer making the
> ordering more explicit, I can rework it that way.
No, you're right. Resuming CPU will block until the suspending vCPU has fully
stopped, so the race is impossible.
~Michal
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |