[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[BUG] Potential Integer Underflow in Time Calibration Logic and Live Snapshot Revert causing DWM crashes in Windows Guests
- To: xen-devel@xxxxxxxxxxxxx
- From: Антон Марков <akmarkov45@xxxxxxxxx>
- Date: Sun, 4 Jan 2026 20:29:21 +0300
- Delivery-date: Mon, 05 Jan 2026 08:07:18 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Component: Xen Hypervisor (x86 / time.c)
Versions affected: Potential in 4.17-4.21 and unstable (tested on
4.18 with high vCPU density)
Description:
In high-load scenarios (24+ cores, heavy Dom0 load, and frequent
VM pauses via DRAKVUF/VMI), Windows guests experience Desktop
Window Manager (DWM.exe) crashes with error 0x8898009b.
The root cause is an integer memory overflow
in the time scaling logic, in case if the time calibration
occurs simultaneously with a snapshot reversion or RDTSC(P)
instruction emulation.
Technical Analysis:
The get_s_time_fixed function in (xen/arch/x86/time.c) accepts
at_tsc as an argument. If it is less than local_tsc, a negative
delta will be produced, which will be incorrectly handled in
scale_delta (Or, if at_tsc is
zero, a race condition may occur after receiving ticks via
rdtsc_ordered, time calibration will occur, and local_tsc
may become larger than the tick values).
This will result in an extremely large number instead of a
backward offset. This is guaranteed to be reproducible in
hvm_load_cpu_ctxt (xen/arch/x86/hvm/hvm.c), as sync_tsc will be
less than local_tsc after time calibration. This can also
potentially occur during RDTSC(P) emulation simultaneously with
time_calibration_rendezvous_tail (xen/arch/x86/time.c).
Windows DWM, sensitive to QueryPerformanceCounter jumps, fails
catastrophically when it receives an essentially infinite
timestamp delta.
Steps to Reproduce:
Setup a host with a high core count (e.g., 24+ cores).
Run a high density of Windows 10 DomUs (20 domains with 4
vcpus each).
Apply heavy load on Dom0 (e.g., DRAKVUF monitoring).
Frequently pause/resume or revert snapshots of the DomUs.
Observe dwm.exe crashes in Guests with
MILERR_QPC_TIME_WENT_BACKWARD (0x8898009b).
Currently, the lack of sign-awareness in the delta scaling path
allows a nanosecond-scale race condition to turn into a
multi-millennium time jump.
Environment:
CPU: 24 cores (Intel Xeon with Invariant TSC)
Dom0: High vCPU count (24)
Feature: tsc_mode="always_emulate", timer_mode="no_delay_for_missed_ticks"
Guest: Windows 10/11 with tsc as time source
|