|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [RFC] RTDS scheduler: potential issues found during safety analysis
Hi all,
We have been performing analysis of the RTDS scheduler code (xen/common/sched/rt.c) and identified several potential issues that we would like to bring to the community's attention. We would appreciate your feedback on whether these issues are considered worth addressing, and if so, what the preferred approach would be.
Below is a summary of the findings. All references are to the current upstream code.
1. Inconsistent validation in domain-wide vs per-vCPU parameter update ----------------------------------------------------------------------
In rt_dom_cntl(), the XEN_DOMCTL_SCHEDOP_putinfo path (domain-wide parameter update) only validates:
if ( op->u.rtds.period == 0 || op->u.rtds.budget == 0 )
In contrast, the XEN_DOMCTL_SCHEDOP_putvcpuinfo path (per-vCPU update) enforces stricter checks:
if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET || budget > period || period < RTDS_MIN_PERIOD )
This means the domain-wide path accepts configurations where budget exceeds period, or where period/budget fall below the 10 us minimum that the per-vCPU path enforces. Such parameters can lead to scheduling overhead issues (very short periods) or over-allocation (budget > period).
Suggested fix: apply identical validation constraints on both paths, i.e. add the same bounds checks (budget <= period, period >= RTDS_MIN_PERIOD, budget >= RTDS_MIN_BUDGET, period <= RTDS_MAX_PERIOD) to the putinfo path. Additionally, the putinfo path does not handle the extratime flag at all, unlike the putvcpuinfo path.
2. Priority level overflow for extratime vCPUs ----------------------------------------------
In burn_budget(), when an extratime vCPU exhausts its budget:
svc->priority_level++; svc->cur_budget = svc->budget;
The priority_level field is declared as `unsigned` (32-bit) and there is no upper bound check before the increment. While rt_update_deadline() resets priority_level to 0 at each period rollover, for a long-running extratime vCPU that continuously exhausts its budget within a single period, the counter could theoretically wrap from UINT_MAX to 0. Since priority_level 0 represents the highest scheduling priority, a wraparound would cause the extratime vCPU to suddenly preempt vCPUs with active real-time reservations.
While this scenario requires an extreme number of budget exhaustion cycles within a single period, it is a concern for long-running embedded or safety systems that operate without reboot for extended durations.
Suggested fix: saturate priority_level at a safe maximum value (e.g., UINT_MAX - 1) instead of allowing unbounded increment.
3. Replenishment timer loss during CPU pool reconfiguration -----------------------------------------------------------
When the last pCPU is removed from an RTDS CPU pool, move_repl_timer() kills the replenishment timer via kill_timer(). When a pCPU is later re-added, rt_switch_sched() re-initializes the timer object (if status is TIMER_STATUS_killed) but does not re-arm it from the existing replenishment queue. If the replq already contains pending entries, those replenishments will not fire until some other event explicitly calls set_timer(), potentially stalling all non-extratime vCPUs.
We believe this is actually a broader issue that goes beyond the RTDS scheduler: the common cpupool infrastructure probably should not allow a cpupool that has assigned vCPUs to lose all of its pCPUs. Preventing such a state at the cpupool management level would address the root cause for all schedulers, not just RTDS.
Suggested fix (RTDS-specific): when timer ownership is re-established in rt_switch_sched(), re-arm the replenishment timer to the earliest deadline in the replq if the queue is non-empty.
Suggested fix (common): the cpupool code should refuse to remove the last pCPU from a cpupool that still has domains/vCPUs assigned to it, returning an error instead. This would prevent the problematic state from arising in the first place.
4. Missing scheduling notification on vCPU insertion ----------------------------------------------------
rt_unit_insert() inserts runnable units into the replenishment and run queues but does not call runq_tickle(). In contrast, rt_unit_wake() and rt_context_saved() both call runq_tickle() after runq_insert(). This means a newly inserted vCPU with a higher priority (earlier deadline) than currently running vCPUs will not be considered for execution until the next natural scheduling event (timer, sleep, budget expiry), which can delay scheduling by up to one full period.
Suggested fix: add a runq_tickle() call after the runq_insert() in rt_unit_insert(), following the same pattern used in rt_unit_wake().
5. Stale scheduling flags on vCPU removal during context switch ---------------------------------------------------------------
rt_unit_remove() removes queue membership via q_remove()/replq_remove() but does not clear the RTDS_delayed_runq_add or RTDS_scheduled flags. If a vCPU is removed while it is being context-switched off a pCPU (i.e., RTDS_scheduled is set and RTDS_delayed_runq_add may be set), rt_context_saved() will later clear RTDS_scheduled and, finding RTDS_delayed_runq_add set, will re-insert the removed vCPU into the run queue via runq_insert() + runq_tickle(). This results in a stale vCPU reference on the scheduler's run queue, belonging to a domain that may be in the process of destruction or migration.
Suggested fix: in rt_unit_remove(), explicitly clear RTDS_delayed_runq_add and RTDS_scheduled flags after removing queue membership, so that rt_context_saved() cannot re-insert a removed vCPU.
We would appreciate any feedback on these findings. We are happy to prepare patches for any of the issues the community considers worth fixing.
Best regards, Oleksii Moisieiev
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |