xen-devel
Re: [Xen-devel] xend failure, restart doesn't work
Ok heres some more info:
currently there are multiple domains up and running perfectly. i'm on
dom0
wilde root 0 # xm list
(111, 'Connection refused')
Error: Error connecting to xend, is xend running?
check if it's running:
wilde root 0 # ps |grep xend
root 13347 0.0 0.3 1436 404 pts/0 D+ 11:57 0:00 grep
xend
wilde root 0 # ps |grep xfrd
root 26122 0.0 0.3 3252 464 ? S Mar16 0:00 xfrd
cleared the logs:
wilde root 0 # cat /var/log/xend.log /var/log/xend-debug.log
wilde root 0 #
stopping xend (just to make sure):
wilde root 0 # /etc/init.d/xend stop
wilde root 0 # cat /var/log/xend.log /var/log/xend-debug.log
wilde root 0 #
starting xend:
wilde root 0 # /etc/init.d/xend start
.........
wilde root 3 #
(exit status 3, takes about 5 seconds of time-out-time)
see what is runnning:
wilde root 0 # ps |grep xend
root 13469 0.0 0.3 1436 472 pts/0 R+ 12:03 0:00 grep
xend
wilde root 0 # ps |grep xfr
root 13420 0.0 0.6 3048 792 ? S 12:02 0:00 xfrd
root 13471 0.0 0.3 1436 472 pts/0 R+ 12:04 0:00 grep
xfr
wilde root 0 # ps |grep xcs
root 13477 0.0 0.3 1436 472 pts/0 R+ 12:04 0:00 grep
xcs
So only xfrd has started. lets see what is in the logs:
wilde root 1 # cat /var/log/xend.log
[2005-03-25 12:02:45 xend] INFO (SrvDaemon:610) Xend Daemon started
( i wish :S )
wilde root 0 # cat /var/log/xend-debug.log
network start bridge=xen-br0 netdev=eth0 antispoof=yes
Traceback (most recent call last):
File "/usr/sbin/xend", line 121, in ?
sys.exit(main())
File "/usr/sbin/xend", line 107, in main
return daemon.start()
File "/usr/lib/python/xen/xend/server/SrvDaemon.py", line 525, in
start
self.run()
File "/usr/lib/python/xen/xend/server/SrvDaemon.py", line 615, in run
SrvServer.create(bridge=1)
File "/usr/lib/python/xen/xend/server/SrvServer.py", line 47, in
create
xend = SrvRoot()
File "/usr/lib/python/xen/xend/server/SrvRoot.py", line 29, in
__init__
self.get(name)
File "/usr/lib/python/xen/xend/server/SrvDir.py", line 69, in get
val = val.getobj()
File "/usr/lib/python/xen/xend/server/SrvDir.py", line 39, in getobj
self.obj = klassobj()
File "/usr/lib/python/xen/xend/server/SrvDomainDir.py", line 25, in
__init__
self.xd = XendDomain.instance()
File "/usr/lib/python/xen/xend/XendDomain.py", line 798, in instance
inst = XendDomain()
File "/usr/lib/python/xen/xend/XendDomain.py", line 65, in __init__
self.initial_refresh()
File "/usr/lib/python/xen/xend/XendDomain.py", line 153, in
initial_refresh
d_dom = self._new_domain(config, doms[domid])
File "/usr/lib/python/xen/xend/XendDomain.py", line 188, in
_new_domain
deferred = XendDomainInfo.vm_recreate(savedinfo, info)
File "/usr/lib/python/xen/xend/XendDomainInfo.py", line 218, in
vm_recreate
d = vm.construct(config)
File "/usr/lib/python/xen/xend/XendDomainInfo.py", line 453, in
construct
self.construct_image()
File "/usr/lib/python/xen/xend/XendDomainInfo.py", line 480, in
construct_image
image_handler(self, image)
File "/usr/lib/python/xen/xend/XendDomainInfo.py", line 1065, in
vm_image_linux
vm.create_domain("linux", kernel, ramdisk, cmdline)
File "/usr/lib/python/xen/xend/XendDomainInfo.py", line 757, in
create_domain
self.create_channel()
File "/usr/lib/python/xen/xend/XendDomainInfo.py", line 782, in
create_channel
remote_port=remote)
File "/usr/lib/python/xen/xend/server/SrvDaemon.py", line 660, in
createDomChannel
remote_port=remote_port)
File "/usr/lib/python/xen/xend/server/channel.py", line 59, in
domChannel
remote_port=remote_port)
File "/usr/lib/python/xen/xend/server/channel.py", line 229, in
__init__
remote_port=remote_port)
File "/usr/lib/python/xen/xend/server/channel.py", line 113, in
createPort
remote_port=int(remote_port))
xen.lowlevel.xu.PortError: Failed to map domain control interface
So there is the problem i guess. But i don't know what it means or how
i should fix it. Any ideas ?
Ian Pratt wrote:
Hi,
Every now and then xend seems to fail, the domains keep running but
control is lost completely.
A restart seems not possible. The logfile says xend restarts but the
only thing that restarts is xfrd binding on port 8002.
The only way to get control back seems to be to reboot the machine,
rebooting all of the client domains too wich is a *bad thing* for our
clients.
Is there any way to get control back?
Is anyone working on the problem is there a patch or even a
know cause??
Are there any steps i can take (or NOT take) to prevent this
from happening?
You could try 'xend stop' and then kill 'xcs' manually, then
'xend start'.
You'll need to give us more help to debug the actual problem your
experiencing.
Ian
|
|
|
|