[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests
Hi Jan, On Tue, Jul 29, 2025 at 12:11 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 29.07.2025 10:52, Mykola Kvach wrote: > > --- a/xen/common/domain.c > > +++ b/xen/common/domain.c > > @@ -1334,16 +1334,15 @@ int domain_shutdown(struct domain *d, u8 reason) > > return 0; > > } > > > > -void domain_resume(struct domain *d) > > +#ifdef CONFIG_ARM > > +void domain_resume_nopause(struct domain *d) > > +#else > > +static void domain_resume_nopause(struct domain *d) > > +#endif > > #ifndef CONFIG_ARM > static > #endif > void domain_resume_nopause(struct domain *d) > > to have as little redundancy as possible. Okay, I’ll change it. > > > { > > struct vcpu *v; > > > > - /* > > - * Some code paths assume that shutdown status does not get reset under > > - * their feet (e.g., some assertions make this assumption). > > - */ > > - domain_pause(d); > > - > > + domain_lock(d); > > This addition of locking affects domain_resume() as well. Neither need nor > correctness are discussed in the description. If the locking was really > needed for domain_resume() as well, I suppose adding that would better be > a separate change anyway. Thanks for the review. The locking was added to avoid potential races involving _VPF_suspended and the suspend/resume logic. Consider the case where domain_lock() is not used inside domain_resume(): Domain 1 initiates suspend via PSCI SYSTEM_SUSPEND. At the same time, Domain 0 invokes resume for Domain 1. Domain 0 calls xl resume, which leads to domain_resume(). Domain 1 acquires domain_lock() as part of the suspend path. Then it acquires the shutdown lock in domain_shutdown(). Domain 0 is blocked waiting for the shutdown lock. When Domain 1 releases the shutdown lock, it sets _VPF_suspended and modifies the VCPU context. Then Domain 0 clears _VPF_suspended. At this point, ctxt_switch_from() might be called with _VPF_suspended already cleared, and the VCPU context partially updated. For example, the guest PC is set to the resume entry point, but some registers like TTBR or SCTLR_EL1 are saved from the current hardware context by ctxt_switch_from. However, after reviewing the flow again, I think this kind of race could still happen even with the lock in place. Imagine Domain 1 sets the flag via SYSTEM_SUSPEND, and then Domain 0 clears it by resuming the domain before the first context switch. This could still result in a partially updated context with inconsistent state. So it might be better to update the VCPU context at the point of resume instead of doing it during suspend. I'll look into that further and also check for other possible races if the update is moved. > > The addition of this locking is particularly interesting considering that > ... > > > spin_lock(&d->shutdown_lock); > > ... is what follows right after. > > > --- a/xen/include/xen/sched.h > > +++ b/xen/include/xen/sched.h > > @@ -814,6 +814,9 @@ void domain_destroy(struct domain *d); > > int domain_kill(struct domain *d); > > int domain_shutdown(struct domain *d, u8 reason); > > void domain_resume(struct domain *d); > > +#ifdef CONFIG_ARM > > +void domain_resume_nopause(struct domain *d); > > +#endif > > > > int domain_soft_reset(struct domain *d, bool resuming); > > > > @@ -1010,6 +1013,10 @@ static inline struct domain *next_domain_in_cpupool( > > /* VCPU is parked. */ > > #define _VPF_parked 8 > > #define VPF_parked (1UL<<_VPF_parked) > > +/* VCPU is suspended. */ > > +#define _VPF_suspended 9 > > +#define VPF_suspended (1UL<<_VPF_suspended) > > Irrespective of the style violations in pre-existing code, can you please > not add further violations, by inserting the missing blanks? Okay > > > + > > > > Please also don't introduce double blank lines. I'll remove it. > > Jan Best regards, Mykola
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |