Xen project Mailing List

Hi all,

We have been performing analysis of the RTDS

scheduler code (xen/common/sched/rt.c) and identified several potential

issues that we would like to bring to the community's attention. We would

appreciate your feedback on whether these issues are considered worth

addressing, and if so, what the preferred approach would be.

Below is a summary of the findings. All references are to the current

upstream code.

1. Inconsistent validation in domain-wide vs per-vCPU parameter update

----------------------------------------------------------------------

In rt_dom_cntl(), the XEN_DOMCTL_SCHEDOP_putinfo path (domain-wide

parameter update) only validates:

if ( op->u.rtds.period == 0 || op->u.rtds.budget == 0 )

In contrast, the XEN_DOMCTL_SCHEDOP_putvcpuinfo path (per-vCPU update)

enforces stricter checks:

if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET ||

budget > period || period < RTDS_MIN_PERIOD )

This means the domain-wide path accepts configurations where budget

exceeds period, or where period/budget fall below the 10 us minimum that

the per-vCPU path enforces. Such parameters can lead to scheduling

overhead issues (very short periods) or over-allocation (budget > period).

Suggested fix: apply identical validation constraints on both paths, i.e.

add the same bounds checks (budget <= period, period >= RTDS_MIN_PERIOD,

budget >= RTDS_MIN_BUDGET, period <= RTDS_MAX_PERIOD) to the putinfo path.

Additionally, the putinfo path does not handle the extratime flag at all,

unlike the putvcpuinfo path.

2. Priority level overflow for extratime vCPUs

----------------------------------------------

In burn_budget(), when an extratime vCPU exhausts its budget:

svc->priority_level++;

svc->cur_budget = svc->budget;

The priority_level field is declared as `unsigned` (32-bit) and there is no

upper bound check before the increment. While rt_update_deadline() resets

priority_level to 0 at each period rollover, for a long-running extratime

vCPU that continuously exhausts its budget within a single period, the

counter could theoretically wrap from UINT_MAX to 0. Since priority_level 0

represents the highest scheduling priority, a wraparound would cause the

extratime vCPU to suddenly preempt vCPUs with active real-time reservations.

While this scenario requires an extreme number of budget exhaustion cycles

within a single period, it is a concern for long-running embedded or safety

systems that operate without reboot for extended durations.

Suggested fix: saturate priority_level at a safe maximum value (e.g.,

UINT_MAX - 1) instead of allowing unbounded increment.

3. Replenishment timer loss during CPU pool reconfiguration

-----------------------------------------------------------

When the last pCPU is removed from an RTDS CPU pool, move_repl_timer()

kills the replenishment timer via kill_timer(). When a pCPU is later

re-added, rt_switch_sched() re-initializes the timer object (if status

is TIMER_STATUS_killed) but does not re-arm it from the existing

replenishment queue. If the replq already contains pending entries, those

replenishments will not fire until some other event explicitly calls

set_timer(), potentially stalling all non-extratime vCPUs.

We believe this is actually a broader issue that goes beyond the RTDS

scheduler: the common cpupool infrastructure probably should not allow

a cpupool that has assigned vCPUs to lose all of its pCPUs. Preventing

such a state at the cpupool management level would address the root cause

for all schedulers, not just RTDS.

Suggested fix (RTDS-specific): when timer ownership is re-established

in rt_switch_sched(), re-arm the replenishment timer to the earliest

deadline in the replq if the queue is non-empty.

Suggested fix (common): the cpupool code should refuse to remove the

last pCPU from a cpupool that still has domains/vCPUs assigned to it,

returning an error instead. This would prevent the problematic state

from arising in the first place.

4. Missing scheduling notification on vCPU insertion

----------------------------------------------------

rt_unit_insert() inserts runnable units into the replenishment and run

queues but does not call runq_tickle(). In contrast, rt_unit_wake() and

rt_context_saved() both call runq_tickle() after runq_insert(). This

means a newly inserted vCPU with a higher priority (earlier deadline)

than currently running vCPUs will not be considered for execution until

the next natural scheduling event (timer, sleep, budget expiry), which

can delay scheduling by up to one full period.

Suggested fix: add a runq_tickle() call after the runq_insert() in

rt_unit_insert(), following the same pattern used in rt_unit_wake().

5. Stale scheduling flags on vCPU removal during context switch

---------------------------------------------------------------

rt_unit_remove() removes queue membership via q_remove()/replq_remove()

but does not clear the RTDS_delayed_runq_add or RTDS_scheduled flags.

If a vCPU is removed while it is being context-switched off a pCPU (i.e.,

RTDS_scheduled is set and RTDS_delayed_runq_add may be set),

rt_context_saved() will later clear RTDS_scheduled and, finding

RTDS_delayed_runq_add set, will re-insert the removed vCPU into the run

queue via runq_insert() + runq_tickle(). This results in a stale vCPU

reference on the scheduler's run queue, belonging to a domain that may be

in the process of destruction or migration.

Suggested fix: in rt_unit_remove(), explicitly clear RTDS_delayed_runq_add

and RTDS_scheduled flags after removing queue membership, so that

rt_context_saved() cannot re-insert a removed vCPU.

We would appreciate any feedback on these findings. We are happy to

prepare patches for any of the issues the community considers worth

fixing.

Best regards,

Oleksii Moisieiev

[RFC] RTDS scheduler: potential issues found during safety analysis