[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v16 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests


  • To: "Orzel, Michal" <michal.orzel@xxxxxxx>
  • From: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • Date: Thu, 19 Mar 2026 12:14:28 +0200
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=iLudv52DxGrfHYRIi9eHlPNqebBUto5d/XqXeWUEk7I=; fh=XPwOTaEPrdk2pN2RgjfihMkJ4NQu8t1nAUIywJrC5RY=; b=ECulBka5SoO/0VoHBSdX/w5RkizDGhnsWgKUgBe6YPlC+EodqEikGt+yuzd67uW1PB GxvCP2LbZvjs2XBaox4j62PY37eZUsTq9AbfhnyHviJu2cGzD/DJqIWcN8Kcf1brpQdk fLnIdboC7cKik0ltww9ShXnT66Yv8HTvRHo0Hd2ZS8LnGzoP1q0OW9Sv12jYj8eXtWj3 kAJr1BLXejd12OVnwQv1oIdmPU0ZNvnq9htpARczekZ1HzBf11gP8YQ5bYMbGPvaf/+y aSX/J8GHc7JN6jpF9wNwlzBvbscMxtp4RlhI5JD+F888SfxjUDSUjCjDR765Y5XmkS/c 7WvA==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1773915280; cv=none; d=google.com; s=arc-20240605; b=cpNnqsQjpwgNGihWiEv8vGSC1AGo1tDZDsLXHWUo7zVSY70MIuT4K1d4C/FybsGEK4 u0OCAYvIKAMln+riFVy0VdTV1xT53PIaMkebBP0P5xCZSst/FBoEDC/X9DcMJ4OLYMog E4O8cVynnrx/z9QZgOjucdtEwuUYn/35mGZcxabiiFzoHL7+crmVdVILNasClqZbhwip N7jzObf7oSqkca2eSi6EgPxCHTpsQ25Lc1Ymm4LIIJMfztLUJkCOOxxH0c7vlvxKx0Oi emvb9QmbsOklJsjPnTnwo1dV+qJuuRYk4pU6Oh0a6D16vfElWrHXk8FxMZyiG7wHZuWZ sOLQ==
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Thu, 19 Mar 2026 10:14:48 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Mar 19, 2026 at 11:37 AM Orzel, Michal <michal.orzel@xxxxxxx> wrote:
>
>
>
> On 12/12/2025 14:18, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > Add support for the PSCI SYSTEM_SUSPEND function in the vPSCI interface,
> > allowing guests to request suspend via the PSCI v1.0+ SYSTEM_SUSPEND call
> > (both 32-bit and 64-bit variants).
> >
> > Implementation details:
> > - Add SYSTEM_SUSPEND function IDs to PSCI definitions
> > - Trap and handle SYSTEM_SUSPEND in vPSCI
> > - Allow only non-hardware domains to invoke SYSTEM_SUSPEND; return
> >   PSCI_NOT_SUPPORTED for the hardware domain to avoid halting the system
> >   in hwdom_shutdown() via domain_shutdown
> > - Require all secondary VCPUs of the calling domain to be offline before
> >   suspend, as mandated by the PSCI specification
> >
> > The arch_domain_resume() function is an architecture-specific hook that is
> > invoked during domain resume to perform any necessary setup or restoration
> > steps required by the platform. arch_domain_resume() stays int to propagate
> > errno-style detail into common logging; preserving the integer keeps the
> > reason visible and leaves room for future arch-specific failures or richer
> > handling.
> >
> > The new vpsci_vcpu_up_prepare() helper is called on the resume path to set 
> > up
> > the vCPU context (such as entry point, some system regs and context ID) 
> > before
> > resuming a suspended guest. This keeps ARM/vPSCI-specific logic out of 
> > common
> > code and avoids intrusive changes to the generic resume flow.
> >
> > Usage:
> >
> > For Linux-based guests, suspend can be initiated with:
> >     echo mem > /sys/power/state
> > or via:
> >     systemctl suspend
> >
> > Resuming the guest is performed from control domain using:
> >       xl resume <domain>
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > Changes in V16:
> > - Refactor error handling in domain_resume: move logging to generic code,
> >   use explicit return code checking.
> > - Make context clearing conditional on success in arch_domain_resume.
> > - The 'int' return type is retained for arch_domain_resume for consistency
> >   with other arch hooks and to allow for specific negative error codes.
> > ---
> >  xen/arch/arm/domain.c                 |  39 +++++++++
> >  xen/arch/arm/include/asm/domain.h     |   2 +
> >  xen/arch/arm/include/asm/perfc_defn.h |   1 +
> >  xen/arch/arm/include/asm/psci.h       |   2 +
> >  xen/arch/arm/include/asm/suspend.h    |  27 ++++++
> >  xen/arch/arm/include/asm/vpsci.h      |   5 +-
> >  xen/arch/arm/vpsci.c                  | 116 +++++++++++++++++++++-----
> >  xen/common/domain.c                   |  10 +++
> >  xen/include/xen/suspend.h             |  25 ++++++
> >  9 files changed, 205 insertions(+), 22 deletions(-)
> >  create mode 100644 xen/arch/arm/include/asm/suspend.h
> >  create mode 100644 xen/include/xen/suspend.h
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 47973f99d9..f903e7d4f0 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -12,6 +12,8 @@
> >  #include <xen/softirq.h>
> >  #include <xen/wait.h>
> >
> > +#include <public/sched.h>
> > +
> >  #include <asm/arm64/sve.h>
> >  #include <asm/cpuerrata.h>
> >  #include <asm/cpufeature.h>
> > @@ -24,10 +26,12 @@
> >  #include <asm/platform.h>
> >  #include <asm/procinfo.h>
> >  #include <asm/regs.h>
> > +#include <asm/suspend.h>
> >  #include <asm/firmware/sci.h>
> >  #include <asm/tee/tee.h>
> >  #include <asm/vfp.h>
> >  #include <asm/vgic.h>
> > +#include <asm/vpsci.h>
> >  #include <asm/vtimer.h>
> >
> >  #include "vpci.h"
> > @@ -851,6 +855,41 @@ void arch_domain_creation_finished(struct domain *d)
> >      p2m_domain_creation_finished(d);
> >  }
> >
> > +int arch_domain_resume(struct domain *d)
> > +{
> > +    int rc;
> > +    struct resume_info *ctx = &d->arch.resume_ctx;
> > +
> > +    if ( !d->is_shutting_down || d->shutdown_code != SHUTDOWN_suspend )
> > +    {
> > +        dprintk(XENLOG_WARNING,
> > +                "%pd: Invalid domain state for resume: 
> > is_shutting_down=%u, shutdown_code=%u\n",
> > +                d, d->is_shutting_down, d->shutdown_code);
> > +        return -EINVAL;
> > +    }
> > +
> > +    /*
> > +     * It is still possible to call domain_shutdown() with a suspend reason
> > +     * via some hypercalls, such as SCHEDOP_shutdown or 
> > SCHEDOP_remote_shutdown.
> > +     * In these cases, the resume context will be empty.
> > +     * This is not expected to cause any issues, so we just notify about 
> > the
> > +     * situation and return without error, allowing the existing logic to
> > +     * proceed as expected.
> > +     */
> > +    if ( !ctx->wake_cpu )
> > +    {
> > +        dprintk(XENLOG_INFO, "%pd: Wake CPU pointer context was not 
> > provided\n",
> > +                d);
> > +        return 0;
> > +    }
> > +
> > +    rc = vpsci_vcpu_up_prepare(ctx->wake_cpu , ctx->ep, ctx->cid);
> > +    if ( !rc )
> > +        memset(ctx, 0, sizeof(*ctx));
> > +
> > +    return rc;
> > +}
> > +
> >  static int is_guest_pv32_psr(uint32_t psr)
> >  {
> >      switch (psr & PSR_MODE_MASK)
> > diff --git a/xen/arch/arm/include/asm/domain.h 
> > b/xen/arch/arm/include/asm/domain.h
> > index 758ad807e4..66b1246892 100644
> > --- a/xen/arch/arm/include/asm/domain.h
> > +++ b/xen/arch/arm/include/asm/domain.h
> > @@ -5,6 +5,7 @@
> >  #include <xen/timer.h>
> >  #include <asm/page.h>
> >  #include <asm/p2m.h>
> > +#include <asm/suspend.h>
> >  #include <asm/vfp.h>
> >  #include <asm/mmio.h>
> >  #include <asm/gic.h>
> > @@ -126,6 +127,7 @@ struct arch_domain
> >      void *sci_data;
> >  #endif
> >
> > +    struct resume_info resume_ctx;
> >  }  __cacheline_aligned;
> >
> >  struct arch_vcpu
> > diff --git a/xen/arch/arm/include/asm/perfc_defn.h 
> > b/xen/arch/arm/include/asm/perfc_defn.h
> > index effd25b69e..8dfcac7e3b 100644
> > --- a/xen/arch/arm/include/asm/perfc_defn.h
> > +++ b/xen/arch/arm/include/asm/perfc_defn.h
> > @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset,        "vpsci: 
> > system_reset")
> >  PERFCOUNTER(vpsci_cpu_suspend,         "vpsci: cpu_suspend")
> >  PERFCOUNTER(vpsci_cpu_affinity_info,   "vpsci: cpu_affinity_info")
> >  PERFCOUNTER(vpsci_features,            "vpsci: features")
> > +PERFCOUNTER(vpsci_system_suspend,      "vpsci: system_suspend")
> >
> >  PERFCOUNTER(vcpu_kick,                 "vcpu: notify other vcpu")
> >
> > diff --git a/xen/arch/arm/include/asm/psci.h 
> > b/xen/arch/arm/include/asm/psci.h
> > index 4780972621..48a93e6b79 100644
> > --- a/xen/arch/arm/include/asm/psci.h
> > +++ b/xen/arch/arm/include/asm/psci.h
> > @@ -47,10 +47,12 @@ void call_psci_system_reset(void);
> >  #define PSCI_0_2_FN32_SYSTEM_OFF          PSCI_0_2_FN32(8)
> >  #define PSCI_0_2_FN32_SYSTEM_RESET        PSCI_0_2_FN32(9)
> >  #define PSCI_1_0_FN32_PSCI_FEATURES       PSCI_0_2_FN32(10)
> > +#define PSCI_1_0_FN32_SYSTEM_SUSPEND      PSCI_0_2_FN32(14)
> >
> >  #define PSCI_0_2_FN64_CPU_SUSPEND         PSCI_0_2_FN64(1)
> >  #define PSCI_0_2_FN64_CPU_ON              PSCI_0_2_FN64(3)
> >  #define PSCI_0_2_FN64_AFFINITY_INFO       PSCI_0_2_FN64(4)
> > +#define PSCI_1_0_FN64_SYSTEM_SUSPEND      PSCI_0_2_FN64(14)
> >
> >  /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
> >  #define PSCI_0_2_AFFINITY_LEVEL_ON      0
> > diff --git a/xen/arch/arm/include/asm/suspend.h 
> > b/xen/arch/arm/include/asm/suspend.h
> > new file mode 100644
> > index 0000000000..313d03ea59
> > --- /dev/null
> > +++ b/xen/arch/arm/include/asm/suspend.h
> > @@ -0,0 +1,27 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#ifndef ARM_SUSPEND_H
> > +#define ARM_SUSPEND_H
> > +
> > +struct domain;
> > +struct vcpu;
> > +
> > +struct resume_info {
> > +    register_t ep;
> > +    register_t cid;
> > +    struct vcpu *wake_cpu;
> > +};
> > +
> > +int arch_domain_resume(struct domain *d);
> > +
> > +#endif /* ARM_SUSPEND_H */
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/arch/arm/include/asm/vpsci.h 
> > b/xen/arch/arm/include/asm/vpsci.h
> > index 0cca5e6830..d790ab3715 100644
> > --- a/xen/arch/arm/include/asm/vpsci.h
> > +++ b/xen/arch/arm/include/asm/vpsci.h
> > @@ -23,12 +23,15 @@
> >  #include <asm/psci.h>
> >
> >  /* Number of function implemented by virtual PSCI (only 0.2 or later) */
> > -#define VPSCI_NR_FUNCS  12
> > +#define VPSCI_NR_FUNCS  14
> >
> >  /* Functions handle PSCI calls from the guests */
> >  bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
> >  bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid);
> >
> > +int vpsci_vcpu_up_prepare(struct vcpu *v, register_t entry_point,
> > +                          register_t context_id);
> > +
> >  #endif /* __ASM_VPSCI_H__ */
> >
> >  /*
> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> > index 7ba9ccd94b..c4d616ec68 100644
> > --- a/xen/arch/arm/vpsci.c
> > +++ b/xen/arch/arm/vpsci.c
> > @@ -10,32 +10,16 @@
> >
> >  #include <public/sched.h>
> >
> > -static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > -                            register_t context_id)
> > +int vpsci_vcpu_up_prepare(struct vcpu *v, register_t entry_point,
> > +                          register_t context_id)
> >  {
> > -    struct vcpu *v;
> > -    struct domain *d = current->domain;
> > -    struct vcpu_guest_context *ctxt;
> >      int rc;
> > +    struct domain *d = v->domain;
> >      bool is_thumb = entry_point & 1;
> > -    register_t vcpuid;
> > -
> > -    vcpuid = vaffinity_to_vcpuid(target_cpu);
> > -
> > -    if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> > -        return PSCI_INVALID_PARAMETERS;
> > -
> > -    /* THUMB set is not allowed with 64-bit domain */
> > -    if ( is_64bit_domain(d) && is_thumb )
> > -        return PSCI_INVALID_ADDRESS;
> > -
> > -    if ( !test_bit(_VPF_down, &v->pause_flags) )
> > -        return PSCI_ALREADY_ON;
> > +    struct vcpu_guest_context *ctxt;
> >
> >      if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
> > -        return PSCI_DENIED;
> > -
> > -    vgic_clear_pending_irqs(v);
> > +        return -ENOMEM;
> >
> >      memset(ctxt, 0, sizeof(*ctxt));
> >      ctxt->user_regs.pc64 = (u64) entry_point;
> > @@ -76,8 +60,37 @@ static int do_common_cpu_on(register_t target_cpu, 
> > register_t entry_point,
> >      free_vcpu_guest_context(ctxt);
> >
> >      if ( rc < 0 )
> > +        return rc;
> > +
> > +    return 0;
> > +}
> > +
> > +static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > +                            register_t context_id)
> > +{
> > +    struct vcpu *v;
> > +    struct domain *d = current->domain;
> > +    int rc;
> > +    bool is_thumb = entry_point & 1;
> > +    register_t vcpuid;
> > +
> > +    vcpuid = vaffinity_to_vcpuid(target_cpu);
> > +
> > +    if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> > +        return PSCI_INVALID_PARAMETERS;
> > +
> > +    /* THUMB set is not allowed with 64-bit domain */
> > +    if ( is_64bit_domain(d) && is_thumb )
> > +        return PSCI_INVALID_ADDRESS;
> > +
> > +    if ( !test_bit(_VPF_down, &v->pause_flags) )
> > +        return PSCI_ALREADY_ON;
> > +
> > +    rc = vpsci_vcpu_up_prepare(v, entry_point, context_id);
> > +    if ( rc )
> >          return PSCI_DENIED;
> >
> > +    vgic_clear_pending_irqs(v);
> >      vcpu_wake(v);
> >
> >      return PSCI_SUCCESS;
> > @@ -197,6 +210,48 @@ static void do_psci_0_2_system_reset(void)
> >      domain_shutdown(d,SHUTDOWN_reboot);
> >  }
> >
> > +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t 
> > cid)
> > +{
> > +    int32_t rc;
> > +    struct vcpu *v;
> > +    struct domain *d = current->domain;
> > +    bool is_thumb = epoint & 1;
> > +
> > +    /* THUMB set is not allowed with 64-bit domain */
> > +    if ( is_64bit_domain(d) && is_thumb )
> > +        return PSCI_INVALID_ADDRESS;
> > +
> > +    /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> > +    if ( is_hardware_domain(d) )
> > +        return PSCI_NOT_SUPPORTED;
> > +
> > +    /* Ensure that all CPUs other than the calling one are offline */
> > +    domain_lock(d);
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( v != current && is_vcpu_online(v) )
> > +        {
> > +            domain_unlock(d);
> > +            return PSCI_DENIED;
> > +        }
> > +    }
> > +    domain_unlock(d);
> > +
> > +    rc = domain_shutdown(d, SHUTDOWN_suspend);
> There is a theoretical race between this and ...
> > +    if ( rc )
> > +        return PSCI_DENIED;
> > +
> > +    d->arch.resume_ctx.ep = epoint;
> > +    d->arch.resume_ctx.cid = cid;
> > +    d->arch.resume_ctx.wake_cpu = current;
> setting up resume context. It is practically impossible but still we could 
> move
> setting up the context before domain_shutdown and clearing it on failure to be
> race free. The race is about small window between domain_shutdown releasing 
> the
> shutdown_lock and other CPU calling domain_resume, taking the lock and running
> arch_domain_resume before resume_ctx has been populated.

I think the current flow is already serialized, so I don't believe
arch_domain_resume() can observe an uninitialized resume_ctx here.

A concurrent domain_resume() does not take shutdown_lock first. It starts
with domain_pause(d), and that path is synchronous: it walks all domain
vCPUs and uses vcpu_sleep_sync().

In the SYSTEM_SUSPEND path, the calling vCPU is still running after
domain_shutdown() returns, and it populates resume_ctx before leaving the
hypercall path. So if another CPU tries to resume the domain in that
window, it should block in domain_pause() until that vCPU stops running.
Only after that can it take shutdown_lock and proceed to
arch_domain_resume().

So, as far as I can see, arch_domain_resume() should not be able to
observe an uninitialized resume_ctx in the current flow.

Also, moving resume_ctx setup before domain_shutdown() would slightly
complicate the failure path, since we would then need to explicitly
clear/invalidate the pre-populated context if domain_shutdown() fails.

So unless we think this race is actually reachable, I would prefer to
keep the current ordering. That said, if you strongly prefer making the
ordering more explicit, I can rework it that way.

Best regards,
Mykola

>
> ~Michal
>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.