[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] Potential Integer Underflow in Time Calibration Logic and Live Snapshot Revert causing DWM crashes in Windows Guests


  • To: xen-devel@xxxxxxxxxxxxx
  • From: Антон Марков <akmarkov45@xxxxxxxxx>
  • Date: Sun, 4 Jan 2026 20:29:21 +0300
  • Delivery-date: Mon, 05 Jan 2026 08:07:18 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Component: Xen Hypervisor (x86 / time.c)
Versions affected: Potential in 4.17-4.21 and unstable (tested on 4.18 with high vCPU density)
Description:
In high-load scenarios (24+ cores, heavy Dom0 load, and frequent VM pauses via DRAKVUF/VMI), Windows guests experience Desktop Window Manager (DWM.exe) crashes with error 0x8898009b.
The root cause is an integer memory overflow in the time scaling logic, in case if the time calibration occurs simultaneously with a snapshot reversion or RDTSC(P) instruction emulation.
Technical Analysis:
The get_s_time_fixed function in (xen/arch/x86/time.c) accepts at_tsc as an argument. If it is less than local_tsc, a negative delta will be produced, which will be incorrectly handled in scale_delta (Or, if at_tsc is zero, a race condition may occur after receiving ticks via rdtsc_ordered, time calibration will occur, and local_tsc may become larger than the tick values). This will result in an extremely large number instead of a backward offset. This is guaranteed to be reproducible in hvm_load_cpu_ctxt (xen/arch/x86/hvm/hvm.c), as sync_tsc will be less than local_tsc after time calibration. This can also potentially occur during RDTSC(P) emulation simultaneously with time_calibration_rendezvous_tail (xen/arch/x86/time.c).
Windows DWM, sensitive to QueryPerformanceCounter jumps, fails catastrophically when it receives an essentially infinite timestamp delta.

Steps to Reproduce:

      Setup a host with a high core count (e.g., 24+ cores).
        
      Run a high density of Windows 10 DomUs (20 domains with 4 vcpus each).
        
      Apply heavy load on Dom0 (e.g., DRAKVUF monitoring).
        
      Frequently pause/resume or revert snapshots of the DomUs.
        
      Observe dwm.exe crashes in Guests with MILERR_QPC_TIME_WENT_BACKWARD (0x8898009b).

Currently, the lack of sign-awareness in the delta scaling path allows a nanosecond-scale race condition to turn into a multi-millennium time jump.

Environment:

      CPU: 24 cores (Intel Xeon with Invariant TSC)

      Dom0: High vCPU count (24)
        
      Feature: tsc_mode="always_emulate", timer_mode="no_delay_for_missed_ticks"
        
      Guest: Windows 10/11 with tsc as time source


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.