[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] RTDS scheduler: potential issues found during safety analysis


  • To: Oleksii Moisieiev <oleksii_moisieiev@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 20 Mar 2026 08:23:43 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: dfaggioli@xxxxxxxx, mengxu@xxxxxxxxxxxxx, gwd@xxxxxxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, julien@xxxxxxx, tiche@xxxxxxxxxxxxx, tiche@xxxxxxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Fri, 20 Mar 2026 07:23:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

(dropping Dario's duplicate but wrong email; adding Jürgen - see ./MAINTAINERS)

On 19.03.2026 18:49, Oleksii Moisieiev wrote:
> Hi all,
> We have been performing analysis of the RTDS
> scheduler code (xen/common/sched/rt.c) and identified several potential
> issues that we would like to bring to the community's attention. We would
> appreciate your feedback on whether these issues are considered worth
> addressing, and if so, what the preferred approach would be.

The question is a little odd: If you mean to use the scheduler in production,
a goal is going to be to move it out of experimental state. For this, sorting
issues like the ones you enumerate is pretty much a requirement. And yes, the
suggested approaches look plausible to me.

One other aspect: Please don't send HTML mail. I'm leaving everything as reply
context, for you to see how it ends up looking when read as plain text (with
my mail UI at least).

Jan

> Below is a summary of the findings. All references are to the current
> upstream code.
> 1. Inconsistent validation in domain-wide vs per-vCPU parameter update
> ----------------------------------------------------------------------
> In rt_dom_cntl(), the XEN_DOMCTL_SCHEDOP_putinfo path (domain-wide
> parameter update) only validates:
> if ( op->u.rtds.period == 0 || op->u.rtds.budget == 0 )
> In contrast, the XEN_DOMCTL_SCHEDOP_putvcpuinfo path (per-vCPU update)
> enforces stricter checks:
> if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET ||
> budget > period || period < RTDS_MIN_PERIOD )
> This means the domain-wide path accepts configurations where budget
> exceeds period, or where period/budget fall below the 10 us minimum that
> the per-vCPU path enforces. Such parameters can lead to scheduling
> overhead issues (very short periods) or over-allocation (budget > period).
> Suggested fix: apply identical validation constraints on both paths, i.e.
> add the same bounds checks (budget <= period, period >= RTDS_MIN_PERIOD,
> budget >= RTDS_MIN_BUDGET, period <= RTDS_MAX_PERIOD) to the putinfo path.
> Additionally, the putinfo path does not handle the extratime flag at all,
> unlike the putvcpuinfo path.
> 2. Priority level overflow for extratime vCPUs
> ----------------------------------------------
> In burn_budget(), when an extratime vCPU exhausts its budget:
> svc->priority_level++;
> svc->cur_budget = svc->budget;
> The priority_level field is declared as `unsigned` (32-bit) and there is no
> upper bound check before the increment. While rt_update_deadline() resets
> priority_level to 0 at each period rollover, for a long-running extratime
> vCPU that continuously exhausts its budget within a single period, the
> counter could theoretically wrap from UINT_MAX to 0. Since priority_level 0
> represents the highest scheduling priority, a wraparound would cause the
> extratime vCPU to suddenly preempt vCPUs with active real-time reservations.
> While this scenario requires an extreme number of budget exhaustion cycles
> within a single period, it is a concern for long-running embedded or safety
> systems that operate without reboot for extended durations.
> Suggested fix: saturate priority_level at a safe maximum value (e.g.,
> UINT_MAX - 1) instead of allowing unbounded increment.
> 3. Replenishment timer loss during CPU pool reconfiguration
> -----------------------------------------------------------
> When the last pCPU is removed from an RTDS CPU pool, move_repl_timer()
> kills the replenishment timer via kill_timer(). When a pCPU is later
> re-added, rt_switch_sched() re-initializes the timer object (if status
> is TIMER_STATUS_killed) but does not re-arm it from the existing
> replenishment queue. If the replq already contains pending entries, those
> replenishments will not fire until some other event explicitly calls
> set_timer(), potentially stalling all non-extratime vCPUs.
> We believe this is actually a broader issue that goes beyond the RTDS
> scheduler: the common cpupool infrastructure probably should not allow
> a cpupool that has assigned vCPUs to lose all of its pCPUs. Preventing
> such a state at the cpupool management level would address the root cause
> for all schedulers, not just RTDS.
> Suggested fix (RTDS-specific): when timer ownership is re-established
> in rt_switch_sched(), re-arm the replenishment timer to the earliest
> deadline in the replq if the queue is non-empty.
> Suggested fix (common): the cpupool code should refuse to remove the
> last pCPU from a cpupool that still has domains/vCPUs assigned to it,
> returning an error instead. This would prevent the problematic state
> from arising in the first place.
> 4. Missing scheduling notification on vCPU insertion
> ----------------------------------------------------
> rt_unit_insert() inserts runnable units into the replenishment and run
> queues but does not call runq_tickle(). In contrast, rt_unit_wake() and
> rt_context_saved() both call runq_tickle() after runq_insert(). This
> means a newly inserted vCPU with a higher priority (earlier deadline)
> than currently running vCPUs will not be considered for execution until
> the next natural scheduling event (timer, sleep, budget expiry), which
> can delay scheduling by up to one full period.
> Suggested fix: add a runq_tickle() call after the runq_insert() in
> rt_unit_insert(), following the same pattern used in rt_unit_wake().
> 5. Stale scheduling flags on vCPU removal during context switch
> ---------------------------------------------------------------
> rt_unit_remove() removes queue membership via q_remove()/replq_remove()
> but does not clear the RTDS_delayed_runq_add or RTDS_scheduled flags.
> If a vCPU is removed while it is being context-switched off a pCPU (i.e.,
> RTDS_scheduled is set and RTDS_delayed_runq_add may be set),
> rt_context_saved() will later clear RTDS_scheduled and, finding
> RTDS_delayed_runq_add set, will re-insert the removed vCPU into the run
> queue via runq_insert() + runq_tickle(). This results in a stale vCPU
> reference on the scheduler's run queue, belonging to a domain that may be
> in the process of destruction or migration.
> Suggested fix: in rt_unit_remove(), explicitly clear RTDS_delayed_runq_add
> and RTDS_scheduled flags after removing queue membership, so that
> rt_context_saved() cannot re-insert a removed vCPU.
> We would appreciate any feedback on these findings. We are happy to
> prepare patches for any of the issues the community considers worth
> fixing.
> Best regards,
> Oleksii Moisieiev
> 
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.