Hello,
I have seen the same problem described by Niels Toedmann a few days ago
(see http://thread.gmane.org/gmane.comp.emulators.xen.devel/7054 )
It appears that during high load some time-related system calls break.
I was able to make a reproducible test case with the 2.0.5 live CD.
Unfortunately I haven't yet been able to test 2.0.6 since I don't have a spare
machine to install that on. I'll wait for the live CD which will hopefully be
released soon.
The easiest way to trigger this is to run a CPU-intensive program in the
background. I've used SETI@Home but I've seen the same problem during other
high loads as well (however, a simple Python "while 1: pass" isn't enough).
This
often breaks the following programs:
* Mailman (IOError in time.sleep())
* Zope (misc time-related errors, or just hangs)
* ncftp (works, but download speed is always reported as 0KB/sec)
* wget (crashes with "acalc_rate: Assertion `msecs >= 0' failed")
* apt-get (crashes with time-related Perl error messages)
* ssh, ssh-keygen (refuses to start, "PRNG not seeded" error)
* top (displays "nan" in the %CPU column)
Try this:
- boot the 2.0.5 live CD in text mode
- ifup eth0 (assuming you have a DHCP server) in domain 0
(no need to boot other domains, but you can reproduce this in them as well)
- wget ftp://alien.ssl.berkeley.edu/pub/setiathome-3.08.i686-pc-linux-gnu.tar
- untar and run setiathome (in the background)
- now try some of these:
- wget the same file again -> crashes
- in python, run the following:
import time
time.sleep(1) -> IOError (not every time though)
- in the shell, run "sleep 1" -> sleeps forever
- try to ssh to another machine -> "PRNG not seeded"
- kill setiathome, the programs start working again (not always though)
There are no new messages in dmesg or /var/log/* after what's caused by ifup.
However, after starting up setiathome, the process "python /usr/sbin/xend
start" starts eating lots of CPU, and goes on doing thateven if I kill
setiathome. I don't know whether this is normal behavior or not. According to
top it takes 99.7% CPU, but ps reports a more modest 15% to 25%.
I'm still new to Xen and I'll answer any questions you may have. Also I'm
planning to retest with the 2.0.6 live CD when it is released.
As I said SETI is an easy program to test with but I've seen the problem
occasionally without SETI as well on a server with only official Debian
binaries (mysql, exim, apache2, spamd, stunnel, php4).
Here's my /proc/cpuinfo on the liveCD test machine:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 733.372
cache size : 256 KB
fdiv_bug : no
hlt_bug : yes
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 mmx fxsr
sse
bogomips : 1464.72
However, I've seen this on some server machines as well.
Here's the cpuinfo for one of them:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping : 1
cpu MHz : 2995.006
cache size : 1024 KB
fdiv_bug : no
hlt_bug : yes
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid xtpr
bogomips : 5976.88
-Osma
--
*** Osma Suominen / MB Concert Ky ***
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|