Hello, everybody:
 
   In my institute, there are hundreds of computers running Xen 
virtual machines. Every virtual machine runs ntpdate to sync time with the 
ntp servers once an hour independently(echo 1 > 
/proc/sys/xen/independent_wallclock in DomU and Dom0). I found that there is one 
virtual machine will have time sync error in one week or two weeks. Everytime 
the misbehavior VM is different (the physical machine is different too), 
however, the time sync error is the same: the VM's time go ahead 36 minutes, and 
then its timer stops until 36 minutes later. This time sync error makes the 
applications fails everytime. 
   There dmesg log in D0 and xm dmesg do not have anything 
unnormal, and the dmesg log in the misbehavior VM is list below:
 
May  5 17:56:49  kernel: Badness in tcp_verify_wq at 
net/ipv4/tcp_ipv4.c:221
May  5 17:56:49  kernel:  
[<c044c34d>] tcp_verify_wq+0x239/0x27b
May  5 17:56:49  
kernel:  [<c0441db3>] tcp_ack+0x65/0x187f
May  5 
17:56:49  kernel:  [<c045fff9>] 
ipt_do_table+0x1e7/0x322
May  5 17:56:49  kernel:  
[<c0460108>] ipt_do_table+0x2f6/0x32
......
May  5 17:56:50  kernel:  [<c012d8c4>] 
autoremove_wake_function+0x0/0x3d
May  5 17:56:50  kernel:  
[<c015d932>] vfs_write+0x8a/0xdd
May  5 17:56:50  
kernel:  [<c015dea1>] sys_write+0x3f/0x6
May  5 
17:56:50  kernel:  [<c0104c8d>] 
syscall_call+0x7/0xb          
Above warning dues to lack of memory, and 
kernel kills some processes.
May  5 17:57:01  
/usr/sbin/cron[14544]:                                       
At this time, timer goes ahead 36 minutes, and 
then the timer stops.
When 36 minutes later, timer works 
again.
May  5 18:33:05  
/usr/sbin/cron[14598]:                                       
May  5 18:33:05  
sshd[2790]: 
May  5 18:33:05  sshd[14659]:
 
   The version of xen is xen-3.2.0-16718-14-0.4, and the version 
of linux is SUSE-2.6.16.60-0.21. The CPU is Intel xeon E5405, memory is 2G, and 
there are 4 DomU VMs in one physical machine. Although the CPU is 64-bit, the 
Xen and Linux is 32-bit version.
   36 minutes are 2,160,000,000 ms, and 2160000000 = 0X 
80BEFC00. In 32-bit system, does it caused by overflowing of some time-keeping 
variables?
   For some reasons, I could not update the version of 
xen or linux to the latest one.
   Could anybody help me to deal with this time 
sync error? Thank you very much for your help!
 
Xiang Zhang
National Research Center for Intelligent Computing Systems
Institute of 
Computing Technology
Chinese Academy of Sciences
Jun 18th, 2009