[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] time-related problems in recent Xen


I have seen the same problem described by Niels Toedmann a few days ago 
(see http://thread.gmane.org/gmane.comp.emulators.xen.devel/7054 )
It appears that during high load some time-related system calls break.

I was able to make a reproducible test case with the 2.0.5 live CD.
Unfortunately I haven't yet been able to test 2.0.6 since I don't have a spare
machine to install that on. I'll wait for the live CD which will hopefully be
released soon.

The easiest way to trigger this is to run a CPU-intensive program in the
background. I've used SETI@Home but I've seen the same problem during other
high loads as well (however, a simple Python "while 1: pass" isn't enough).
often breaks the following programs:

* Mailman (IOError in time.sleep())
* Zope (misc time-related errors, or just hangs)
* ncftp (works, but download speed is always reported as 0KB/sec)
* wget (crashes with "acalc_rate: Assertion `msecs >= 0' failed")
* apt-get (crashes with time-related Perl error messages)
* ssh, ssh-keygen (refuses to start, "PRNG not seeded" error)
* top (displays "nan" in the %CPU column)

Try this:
- boot the 2.0.5 live CD in text mode
- ifup eth0 (assuming you have a DHCP server) in domain 0
  (no need to boot other domains, but you can reproduce this in them as well)
- wget ftp://alien.ssl.berkeley.edu/pub/setiathome-3.08.i686-pc-linux-gnu.tar
- untar and run setiathome (in the background)
- now try some of these:
   - wget the same file again -> crashes
   - in python, run the following:
      import time
      time.sleep(1) -> IOError (not every time though)
   - in the shell, run "sleep 1" -> sleeps forever 
   - try to ssh to another machine -> "PRNG not seeded"
- kill setiathome, the programs start working again (not always though)

There are no new messages in dmesg or /var/log/* after what's caused by ifup.
However, after starting up setiathome, the process "python /usr/sbin/xend
start" starts eating lots of CPU, and goes on doing thateven if I kill
setiathome. I don't know whether this is normal behavior or not. According to
top it takes 99.7% CPU, but ps reports a more modest 15% to 25%.

I'm still new to Xen and I'll answer any questions you may have. Also I'm
planning to retest with the 2.0.6 live CD when it is released.

As I said SETI is an easy program to test with but I've seen the problem
occasionally without SETI as well on a server with only official Debian 
binaries (mysql, exim, apache2, spamd, stunnel, php4).

Here's my /proc/cpuinfo on the liveCD test machine:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 6
cpu MHz         : 733.372
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : yes
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 mmx fxsr
bogomips        : 1464.72

However, I've seen this on some server machines as well.
Here's the cpuinfo for one of them:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 1
cpu MHz         : 2995.006
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : yes
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid xtpr
bogomips        : 5976.88


*** Osma Suominen / MB Concert Ky ***

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.