Hi all,
There is a problem in Xen now,When fatal error happened on VM like
qemu-dm process died, xend should take care of it. Don't leave it as
defunct process (zombie process).
Because of mis-configuration or some other reason, qemu-dm process would
die.
For now, xend haven't taken care of this dead process and it remains as
defunct process, and xm list shows VM status assigned to the process as
vserv1134 5 6144 1 ------ 0.0
This patch fix xend as when fatal error happened (e.g. qemu-dm process
was killed)
log error meesge then destroy that domain and clean up the process (no
zombies)
This is caused by the xend daemon, xend forks a process and run the
qemu-dm program, when qemu-dm was killed directly ,xend doesn't have a
chance to call
the wait() function to collect this zombie child(qemu-dm is executed by
a thread).For the xend doesn't have any idea of the qemu-dm child is
alive or being killed.
For the above reason,added some code in xend to check those hvm DM
status each 30 seconds.
Have made a patch based on the open source xen3.2.1 source code.
Please review this patch.
Thanks.
Xiaowei
--- xen-3.2.1/tools/python/xen/xend/server/SrvServer.py.org 2008-05-21
13:53:08.000000000 +0800
+++ xen-3.2.1/tools/python/xen/xend/server/SrvServer.py 2008-05-21
13:58:56.000000000 +0800
@@ -44,6 +44,7 @@
import re
import time
import signal
+import os
from threading import Thread
from xen.web.httpserver import HttpServer, UnixHttpServer
@@ -148,14 +149,28 @@
# Reaching this point means we can auto start domains
try:
- xenddomain().autostart_domains()
+ dom = xenddomain()
+ dom.autostart_domains()
except Exception, e:
log.exception("Failed while autostarting domains")
# loop to keep main thread alive until it receives a
SIGTERM
self.running = True
while self.running:
- time.sleep(100000000)
+ # loop to destroy those hvm domain that whoes DM has dead
unexpectedly.
+ for item in dom.domains.values():
+ if item.info.is_hvm():
+ device_model_pid =
item.gatherDom(('image/device-model-pid', str))
+ dm_stat_cmd = "ps -o stat --no-headers
-p"+device_model_pid
+ dm_stat = os.popen(dm_stat_cmd).readline().rstrip()
+ log.info("status of the command is:" + dm_stat + "end
of output")
+ if dm_stat == 'Z':
+ log.info("status of the command is:" + dm_stat +
"end of
output")
+ log.warn("Devices Model for " + str(item) + "was
killed
unexpectedly")
+ item.destroy()
+ else:
+ continue
+ time.sleep(30)
if self.reloadingConfig:
log.info("Restarting all XML-RPC and Xen-API
servers...")
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|