|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [BUG] Potential Integer Underflow in Time Calibration Logic and Live Snapshot Revert causing DWM crashes in Windows Guests
On 04.01.2026 18:29, Антон Марков wrote: > Component: Xen Hypervisor (x86 / time.c) > Versions affected: Potential in 4.17-4.21 and unstable (tested on 4.18 > with high vCPU density) > Description: > In high-load scenarios (24+ cores, heavy Dom0 load, and frequent VM > pauses via DRAKVUF/VMI), Windows guests experience Desktop Window > Manager (DWM.exe) crashes with error 0x8898009b. > The root cause is an integer memory overflow in the time scaling logic, > in case if the time calibration occurs simultaneously with a snapshot > reversion or RDTSC(P) instruction emulation. > Technical Analysis: > The get_s_time_fixed function in (xen/arch/x86/time.c) accepts at_tsc as > an argument. If it is less than local_tsc, a negative delta will be > produced, which will be incorrectly handled in scale_delta (Or, if > at_tsc is zero, a race condition may occur after receiving ticks via > rdtsc_ordered, time calibration will occur, and local_tsc may become > larger than the tick values). This will result in an extremely large > number instead of a backward offset. This is guaranteed to be > reproducible in hvm_load_cpu_ctxt (xen/arch/x86/hvm/hvm.c), as sync_tsc > will be less than local_tsc after time calibration. Indeed, this will need fixing. > This can also > potentially occur during RDTSC(P) emulation simultaneously with > time_calibration_rendezvous_tail (xen/arch/x86/time.c). > Windows DWM, sensitive to QueryPerformanceCounter jumps, fails > catastrophically when it receives an essentially infinite timestamp delta. > > Steps to Reproduce: > > Setup a host with a high core count (e.g., 24+ cores). > > Run a high density of Windows 10 DomUs (20 domains with 4 vcpus > each). > > Apply heavy load on Dom0 (e.g., DRAKVUF monitoring). > > Frequently pause/resume or revert snapshots of the DomUs. > > Observe dwm.exe crashes in Guests with > MILERR_QPC_TIME_WENT_BACKWARD (0x8898009b). > > Currently, the lack of sign-awareness in the delta scaling path allows a > nanosecond-scale race condition to turn into a multi-millennium time jump. Just to mention: I think scale_delta() was never intended to be called with negative delta values. Hence my plan is to deal with those call sites which may encounter negative deltas. I hope to get to this tomorrow. Thanks for the report, Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |