[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC] RTDS scheduler: potential issues found during safety analysis


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Oleksii Moisieiev <oleksii_moisieiev@xxxxxxxx>
  • Date: Thu, 19 Mar 2026 19:49:22 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TiV4OJ9RkuDFcSi2lKHDcz1LCfVpgQ9ioLTRJwYNCTs=; b=uJA8yzOpB//TgOQ4UWR2An+6PZygbCTgW8v5wgdYTOrdkkh6tu8kMU0iRqUJLg5tDuzllEcIw9ENPTJa/nZ5Ywoo4XXlB/tcPp5AUEYViGEwwBtP0fDI+NDt/D7eygYJyPcD3c8v/u0LbvkoNlf9dVOC69UoCXPoM5pRaeGwquIUZH/QjwrxcisXNtsPUYrFxAku234HUuNzkhntAsHxHGov7ea3R6MQUxkjTVK5RgShcRclbchMUom0CF/tm5nWpaTgo4fCzckOQ39leUoauL1pMXy2qUGGhNxvsblvVrcocA/lb0dnVRlFSTzK0f2C8SHL6lmRx6ysf23j6KHW9A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=jDtXr+16Z/XhtdCCgCgVJzb4+xbqT/aTm5rxujZIuSaopVI9v1Dj0zqX+kCvOqUR9zObNE1KfMnvV7breV+15kuraJxOW2SFV1p+0GradGojqVI5zBjuSfUGrpE9Hhd7Ly/1EYfQ1awyGV2yWQA5q5hOYshl0AUiCcyW1fbqmOTVOg5ufhoJxnzW08TzJUIV6YXSaZdMWNhNNTR0f4FvIiMsij0IJeDY41MZC/UiPpMqatjrhUbJMRPHVi9see+7tZ+f6rN3UOrpJudv9B7jYgZaZVp0ao0YYD26X7JBBz7phZSAlBCVX8Az3UR1NBO1s17EvQXq95RKW7wbKMY51w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=epam.com;
  • Cc: dfaggioli@xxxxxxxx, mengxu@xxxxxxxxxxxxx, gwd@xxxxxxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, julien@xxxxxxx, jbeulich@xxxxxxxx, tiche@xxxxxxxxxxxxx, tiche@xxxxxxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, dario.faggioli@xxxxxxxxxx, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Delivery-date: Thu, 19 Mar 2026 17:49:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi all,
We have been performing analysis of the RTDS
scheduler code (xen/common/sched/rt.c) and identified several potential
issues that we would like to bring to the community's attention. We would
appreciate your feedback on whether these issues are considered worth
addressing, and if so, what the preferred approach would be.
Below is a summary of the findings. All references are to the current
upstream code.
1. Inconsistent validation in domain-wide vs per-vCPU parameter update
----------------------------------------------------------------------
In rt_dom_cntl(), the XEN_DOMCTL_SCHEDOP_putinfo path (domain-wide
parameter update) only validates:
if ( op->u.rtds.period == 0 || op->u.rtds.budget == 0 )
In contrast, the XEN_DOMCTL_SCHEDOP_putvcpuinfo path (per-vCPU update)
enforces stricter checks:
if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET ||
budget > period || period < RTDS_MIN_PERIOD )
This means the domain-wide path accepts configurations where budget
exceeds period, or where period/budget fall below the 10 us minimum that
the per-vCPU path enforces. Such parameters can lead to scheduling
overhead issues (very short periods) or over-allocation (budget > period).
Suggested fix: apply identical validation constraints on both paths, i.e.
add the same bounds checks (budget <= period, period >= RTDS_MIN_PERIOD,
budget >= RTDS_MIN_BUDGET, period <= RTDS_MAX_PERIOD) to the putinfo path.
Additionally, the putinfo path does not handle the extratime flag at all,
unlike the putvcpuinfo path.
2. Priority level overflow for extratime vCPUs
----------------------------------------------
In burn_budget(), when an extratime vCPU exhausts its budget:
svc->priority_level++;
svc->cur_budget = svc->budget;
The priority_level field is declared as `unsigned` (32-bit) and there is no
upper bound check before the increment. While rt_update_deadline() resets
priority_level to 0 at each period rollover, for a long-running extratime
vCPU that continuously exhausts its budget within a single period, the
counter could theoretically wrap from UINT_MAX to 0. Since priority_level 0
represents the highest scheduling priority, a wraparound would cause the
extratime vCPU to suddenly preempt vCPUs with active real-time reservations.
While this scenario requires an extreme number of budget exhaustion cycles
within a single period, it is a concern for long-running embedded or safety
systems that operate without reboot for extended durations.
Suggested fix: saturate priority_level at a safe maximum value (e.g.,
UINT_MAX - 1) instead of allowing unbounded increment.
3. Replenishment timer loss during CPU pool reconfiguration
-----------------------------------------------------------
When the last pCPU is removed from an RTDS CPU pool, move_repl_timer()
kills the replenishment timer via kill_timer(). When a pCPU is later
re-added, rt_switch_sched() re-initializes the timer object (if status
is TIMER_STATUS_killed) but does not re-arm it from the existing
replenishment queue. If the replq already contains pending entries, those
replenishments will not fire until some other event explicitly calls
set_timer(), potentially stalling all non-extratime vCPUs.
We believe this is actually a broader issue that goes beyond the RTDS
scheduler: the common cpupool infrastructure probably should not allow
a cpupool that has assigned vCPUs to lose all of its pCPUs. Preventing
such a state at the cpupool management level would address the root cause
for all schedulers, not just RTDS.
Suggested fix (RTDS-specific): when timer ownership is re-established
in rt_switch_sched(), re-arm the replenishment timer to the earliest
deadline in the replq if the queue is non-empty.
Suggested fix (common): the cpupool code should refuse to remove the
last pCPU from a cpupool that still has domains/vCPUs assigned to it,
returning an error instead. This would prevent the problematic state
from arising in the first place.
4. Missing scheduling notification on vCPU insertion
----------------------------------------------------
rt_unit_insert() inserts runnable units into the replenishment and run
queues but does not call runq_tickle(). In contrast, rt_unit_wake() and
rt_context_saved() both call runq_tickle() after runq_insert(). This
means a newly inserted vCPU with a higher priority (earlier deadline)
than currently running vCPUs will not be considered for execution until
the next natural scheduling event (timer, sleep, budget expiry), which
can delay scheduling by up to one full period.
Suggested fix: add a runq_tickle() call after the runq_insert() in
rt_unit_insert(), following the same pattern used in rt_unit_wake().
5. Stale scheduling flags on vCPU removal during context switch
---------------------------------------------------------------
rt_unit_remove() removes queue membership via q_remove()/replq_remove()
but does not clear the RTDS_delayed_runq_add or RTDS_scheduled flags.
If a vCPU is removed while it is being context-switched off a pCPU (i.e.,
RTDS_scheduled is set and RTDS_delayed_runq_add may be set),
rt_context_saved() will later clear RTDS_scheduled and, finding
RTDS_delayed_runq_add set, will re-insert the removed vCPU into the run
queue via runq_insert() + runq_tickle(). This results in a stale vCPU
reference on the scheduler's run queue, belonging to a domain that may be
in the process of destruction or migration.
Suggested fix: in rt_unit_remove(), explicitly clear RTDS_delayed_runq_add
and RTDS_scheduled flags after removing queue membership, so that
rt_context_saved() cannot re-insert a removed vCPU.
We would appreciate any feedback on these findings. We are happy to
prepare patches for any of the issues the community considers worth
fixing.
Best regards,
Oleksii Moisieiev



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.