[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Potential Integer Underflow in Time Calibration Logic and Live Snapshot Revert causing DWM crashes in Windows Guests


  • To: Антон Марков <akmarkov45@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 5 Jan 2026 15:50:46 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: xen-devel@xxxxxxxxxxxxx
  • Delivery-date: Mon, 05 Jan 2026 14:51:02 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.01.2026 18:29, Антон Марков wrote:
> Component: Xen Hypervisor (x86 / time.c)
> Versions affected: Potential in 4.17-4.21 and unstable (tested on 4.18 
> with high vCPU density)
> Description:
> In high-load scenarios (24+ cores, heavy Dom0 load, and frequent VM 
> pauses via DRAKVUF/VMI), Windows guests experience Desktop Window 
> Manager (DWM.exe) crashes with error 0x8898009b.
> The root cause is an integer memory overflow in the time scaling logic, 
> in case if the time calibration occurs simultaneously with a snapshot 
> reversion or RDTSC(P) instruction emulation.
> Technical Analysis:
> The get_s_time_fixed function in (xen/arch/x86/time.c) accepts at_tsc as 
> an argument. If it is less than local_tsc, a negative delta will be 
> produced, which will be incorrectly handled in scale_delta (Or, if 
> at_tsc is zero, a race condition may occur after receiving ticks via 
> rdtsc_ordered, time calibration will occur, and local_tsc may become 
> larger than the tick values). This will result in an extremely large 
> number instead of a backward offset. This is guaranteed to be 
> reproducible in hvm_load_cpu_ctxt (xen/arch/x86/hvm/hvm.c), as sync_tsc 
> will be less than local_tsc after time calibration.

Indeed, this will need fixing.

> This can also 
> potentially occur during RDTSC(P) emulation simultaneously with 
> time_calibration_rendezvous_tail (xen/arch/x86/time.c).
> Windows DWM, sensitive to QueryPerformanceCounter jumps, fails 
> catastrophically when it receives an essentially infinite timestamp delta.
> 
> Steps to Reproduce:
> 
>        Setup a host with a high core count (e.g., 24+ cores).
> 
>        Run a high density of Windows 10 DomUs (20 domains with 4 vcpus 
> each).
> 
>        Apply heavy load on Dom0 (e.g., DRAKVUF monitoring).
> 
>        Frequently pause/resume or revert snapshots of the DomUs.
> 
>        Observe dwm.exe crashes in Guests with 
> MILERR_QPC_TIME_WENT_BACKWARD (0x8898009b).
> 
> Currently, the lack of sign-awareness in the delta scaling path allows a 
> nanosecond-scale race condition to turn into a multi-millennium time jump.

Just to mention: I think scale_delta() was never intended to be called
with negative delta values. Hence my plan is to deal with those call sites
which may encounter negative deltas. I hope to get to this tomorrow.

Thanks for the report, Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.