[Xen-users] Production problem

Hello,

I’m a Linux administrator in charge of Xen environment in a large organization.

We have about 300 VMs spread on 25 clusters (RHCS) running XEN.

Last night I had an unexpected reboot on a particular virtual machine and I can’t figure out what happened.

Xend.log says:

[2010-08-12 18:58:39 xend 14729] ERROR (xmlrpclib2:184) (16, 'Device or resource busy')

Traceback (most recent call last):

File "/usr/lib64/python2.4/site-packages/xen/util/xmlrpclib2.py", line 162, in _marshaled_dispatch

response = self._dispatch(method, params)

File "/usr/lib64/python2.4/SimpleXMLRPCServer.py", line 406, in _dispatch

return func(*params)

File "/usr/lib64/python2.4/site-packages/xen/xend/server/XMLRPCServer.py", line 54, in domain

return fixup_sxpr(info.sxpr())

File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1319, in sxpr

for config in self.getDeviceConfigurations(cls):

File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1250, in getDeviceConfigurations

return self.getDeviceController(deviceClass).configurations()

File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 236, in configurations

return map(self.configuration, self.deviceIDs())

File "/usr/lib64/python2.4/site-packages/xen/xend/server/vfbif.py", line 39, in configuration

r = DevController.configuration(self, devid)

File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 244, in configuration

backdomid = xstransact.Read(self.devicePath(devid), "backend-id")

File "/usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 297, in Read

return complete(path, lambda t: t.read(*args))

File "/usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 351, in complete

t = xstransact(path)

File "/usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 20, in __init__

self.transaction = xshandle().transaction_start()

Error: (16, 'Device or resource busy')

[2010-08-12 18:58:39 xend.XendDomainInfo 14729] DEBUG (XendDomainInfo:1036) XendDomainInfo.handleShutdownWatch

Environment looks like:

Physical hosts: Two DELL R710 servers, 144Gb Memo, 8 Intel Xeon X5460 3.16GHz (2 quad) running RHEL 5.3, kernel 2.6.18-128.1.1.el5xen x86_64, 2 bonded LAN NIC (bnx2), 2 bonded heartbeat NIC (e1000e) plus one administration NIC. To connect to our SAN we use 2 redundant HBA QLogic ISP2432-based 4Gb and hook it to a EMC Clarion CX4 storage. Multipathing done through EMC PowerPath.

VM: RHEL 5.3 Kernel 2.6.18-128.1.1.el5xen x86_64 x86_64, 2Gb memo, 2 CPUs. VM is LV backed.

This VM was up and running for months. After rebooting its running fine as well.

There were 2 other VMs running in this same cluster, using the same storage, same OS, same configurations. Those weren’t affected at all.

I couldn’t find any info in log files, except for the reboot.

Can anyone decipher what the error above is trying to tell?

Any ideas?

Thanks in advance.

Regards,

Emerson Ribeiro

55 11 4344-8905

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] Production problem