WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Xen locks down on specific server after 1-3 days

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Xen locks down on specific server after 1-3 days
From: Silviu Paragina <silviu@xxxxxxxxxxx>
Date: Mon, 31 Aug 2009 11:18:21 +0300
Delivery-date: Mon, 31 Aug 2009 01:19:00 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)
I've tried quite a few things, googleing gives some results but they don't seem to be related.

I have two servers one for testing one for production. I needed xen on the production one so I could get windows running on top of linux. Unfortunately the test server is different from the production one. The testing one has a Xeon X3210(4 cores), 2GB ram, the production one has an Xeon 3040 (2 cores), 2GB ram. So there is quite a difference. Considering the fact that the server locked up without any domUs I will not post the windows domain config, which in this case seems irrelevant.

Here is the story
The production server has Ubuntu 8.04 LTS on it so I thought I should use it. Did an apt-get for xen packages from the backports repository (xen 3.3.0). All worked out perfectly on the test machine. On the network side I made a config similar to network-bridge (actualy a striped down network-bridge) with the sole exception that it doesn't replace the physical interface (ie all virtual machine are attached to a bridge, and nothing else) Did an apt-get on the production server for xen, rebooted with xen, then started to upload the windows image from the testing machine (no configs yet). Having a slow connection I left it upload over the weekend (this was on a Friday).

2 days later (Sunday) I couldn't ssh into the machine, the response wasn't a timeout, but an Connection closed message after waiting a long time (longer than the usual timeout when the machine is down). I got back pings (only thing that seems to work), and all the other services (vpn/http) were behaving the same: connection seems to establish but the actual services don't seem to respond.

After a reset I noticed the logs full of BUG: soft lockup - CPU#0 stuck for 11s! [sshd:..] (see attached log file). At that time I had hoped it was a fluke (despite the fact that my logic was yelling otherwise). The config file of that time is attached as xend-config.sxp (or the nocomment one).

After this incident I went with installing the windows directly on the machine. All seemed fine till another 2 days (or so) passed and it locked up again this time without any log entries.

After another few lockups (without any log entries) and desperate config changes (memory related config changes, see xend-config.sxp.diff), I tried building the 3.3.2 xen packages. Did an apt-get source xen-hypervisor-3.3 replaced the source in the package with the one from xen.org(3.3.2), removed the ubuntu patches and built it. Unfortunately the kernel source package seemed a bit too complex for my understanding and went with the stock ubuntu(xen) one. It booted, everything seemed fine on the test machine (it ran without a lock over a weekend, friday till monday), deployed on the production, and this time after 3 days (actualy about 2 day and 16 hours) it locked up.

Probably irrelevant, but still: yesterday something weird came up on the test machine and only the test machine, whenever i shutdown the windows guest the vm lock on state s (even if i do xm shutdown machine or shut it down from inside windows).

Now I'm here. I shall try forcing dom0 to go with one cpu and the windows with the other. But this should be only a temporary solution because dom0 is running some services, and it requires some processing power sometimes, and the same goes for windows.

Right now I'm not sure
- if I should try compiling from sources directly from xen.org. 3.3? 3.4?
- if I should try compiling the kernel from xen.org
- if I should downgrade to 3.2 which is in the standard ubuntu 8.04, not from back-ports - if 3.2 can run without problems windows 2008 (3.3 seemed to be the first one, deducing from the version changelog, that could run windows 2008 server)
- if I should upgrade to ubuntu jaunty


Any help or suggestions are appreciated I've been trying stuff for 2-3 weeks now :(

Cheers,
Silviu

[89997.855760] BUG: soft lockup - CPU#0 stuck for 11s! [sshd:28512]
[89997.855817]
[89997.855819] Pid: 28512, comm: sshd Not tainted (2.6.24-24-xen #1)
[89997.855821] EIP: 0061:[ipv6:_spin_lock+0x7/0x10] EFLAGS: 00200286 CPU: 0
[89997.855828] EIP is at _spin_lock+0x7/0x10
[89997.855830] EAX: c1c3314c EBX: 00000000 ECX: c1c33140 EDX: 000008b0
[89997.855832] ESI: 5e68a067 EDI: 00000000 EBP: c0477158 ESP: e219fde4
[89997.855835]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[89997.855839] CR0: 8005003b CR2: b7916000 CR3: 2af1f000 CR4: 00002660
[89997.855843] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[89997.855845] DR6: ffff0ff0 DR7: 00000400
[89997.855848]  [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0
[89997.855866]  [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130
[89997.855871]  [handle_mm_fault+0x223/0x1360] handle_mm_fault+0x223/0x1360
[89997.855876]  [get_unused_fd_flags+0x52/0xd0] get_unused_fd_flags+0x52/0xd0
[89997.855882]  [apparmor_inode_permission+0x47/0x70] 
apparmor_inode_permission+0x47/0x70
[89997.855889]  [prio_tree_insert+0x1f/0x240] prio_tree_insert+0x1f/0x240
[89997.855896]  [vma_prio_tree_insert+0x1f/0x50] vma_prio_tree_insert+0x1f/0x50
[89997.855900]  [vma_link+0x9b/0x100] vma_link+0x9b/0x100
[89997.855906]  [do_page_fault+0x35e/0xe70] do_page_fault+0x35e/0xe70
[89997.855913]  [default_llseek+0x6b/0xc0] default_llseek+0x6b/0xc0
[89997.855916]  [default_llseek+0x0/0xc0] default_llseek+0x0/0xc0
[89997.855919]  [usbcore:copy_to_user+0x30/0x510] copy_to_user+0x30/0x60
[89997.855923]  [sys_llseek+0x93/0xb0] sys_llseek+0x93/0xb0
[89997.855927]  [do_page_fault+0x0/0xe70] do_page_fault+0x0/0xe70
[89997.855930]  [error_code+0x35/0x40] error_code+0x35/0x40
# -*- sh -*-

#
# Xend configuration file.
#

# This example configuration is appropriate for an installation that 
# utilizes a bridged network configuration. Access to xend via http
# is disabled.  

# Commented out entries show the default for that entry, unless otherwise
# specified.

#(logfile /var/log/xen/xend.log)
#(loglevel DEBUG)


# The Xen-API server configuration.
#
# This value configures the ports, interfaces, and access controls for the
# Xen-API server.  Each entry in the list starts with either unix, a port
# number, or an address:port pair.  If this is "unix", then a UDP socket is
# opened, and this entry applies to that.  If it is a port, then Xend will
# listen on all interfaces on that TCP port, and if it is an address:port
# pair, then Xend will listen on the specified port, using the interface with
# the specified address.
#
# The subsequent string configures the user-based access control for the
# listener in question.  This can be one of "none" or "pam", indicating either
# that users should be allowed access unconditionally, or that the local
# Pluggable Authentication Modules configuration should be used.  If this
# string is missing or empty, then "pam" is used.
#
# The final string gives the host-based access control for that listener. If
# this is missing or empty, then all connections are accepted.  Otherwise,
# this should be a space-separated sequence of regular expressions; any host
# with a fully-qualified domain name or an IP address that matches one of
# these regular expressions will be accepted.
#
# Example: listen on TCP port 9363 on all interfaces, accepting connections
# only from machines in example.com or localhost, and allow access through
# the unix domain socket unconditionally:
#
#   (xen-api-server ((9363 pam '^localhost$ example\\.com$')
#                    (unix none)))
#
# Optionally, the TCP Xen-API server can use SSL by specifying the private
# key and certificate location:
#
#                    (9367 pam '' /etc/xen/xen-api.key /etc/xen/xen-api.crt)
#
# Default:
#   (xen-api-server ((unix)))


#(xend-http-server no)
#(xend-unix-server no)
#(xend-tcp-xmlrpc-server no)
#(xend-unix-xmlrpc-server yes)
#(xend-relocation-server no)
(xend-relocation-server yes)
#(xend-relocation-ssl-server no)

#(xend-unix-path /var/lib/xend/xend-socket)


# Address and port xend should use for the legacy TCP XMLRPC interface, 
# if xend-tcp-xmlrpc-server is set.
#(xend-tcp-xmlrpc-server-address 'localhost')
#(xend-tcp-xmlrpc-server-port 8006)

# SSL key and certificate to use for the legacy TCP XMLRPC interface.
# Setting these will mean that this port serves only SSL connections as
# opposed to plaintext ones.
#(xend-tcp-xmlrpc-server-ssl-key-file  /etc/xen/xmlrpc.key)
#(xend-tcp-xmlrpc-server-ssl-cert-file /etc/xen/xmlrpc.crt)


# Port xend should use for the HTTP interface, if xend-http-server is set.
#(xend-port            8000)

# Port xend should use for the relocation interface, if xend-relocation-server
# is set.
#(xend-relocation-port 8002)

# Port xend should use for the ssl relocation interface, if
# xend-relocation-ssl-server is set.
#(xend-relocation-ssl-port 8003)

# SSL key and certificate to use for the ssl relocation interface, if
# xend-relocation-ssl-server is set.
#(xend-relocation-server-ssl-key-file  /etc/xen/xmlrpc.key)
#(xend-relocation-server-ssl-cert-file  /etc/xen/xmlrpc.crt)

# Whether to use ssl as default when relocating.
#(xend-relocation-ssl no)

# Address xend should listen on for HTTP connections, if xend-http-server is
# set.
# Specifying 'localhost' prevents remote connections.
# Specifying the empty string '' (the default) allows all connections.
#(xend-address '')
#(xend-address localhost)

# Address xend should listen on for relocation-socket connections, if
# xend-relocation-server is set.
# Meaning and default as for xend-address above.
#(xend-relocation-address '')

# The hosts allowed to talk to the relocation port.  If this is empty (the
# default), then all connections are allowed (assuming that the connection
# arrives on a port and interface on which we are listening; see
# xend-relocation-port and xend-relocation-address above).  Otherwise, this
# should be a space-separated sequence of regular expressions.  Any host with
# a fully-qualified domain name or an IP address that matches one of these
# regular expressions will be accepted.
#
# For example:
#  (xend-relocation-hosts-allow '^localhost$ ^.*\\.example\\.org$')
#
#(xend-relocation-hosts-allow '')
(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$')

# The limit (in kilobytes) on the size of the console buffer
#(console-limit 1024)

##
# To bridge network traffic, like this:
#
# dom0: ----------------- bridge -> real eth0 -> the network
#                            |
# domU: fake eth0 -> vifN.0 -+
#
# use
#
# (network-script network-bridge)
#
# Your default ethernet device is used as the outgoing interface, by default. 
# To use a different one (e.g. eth1) use
#
# (network-script 'network-bridge netdev=eth1')
#
# The bridge is named xenbr0, by default.  To rename the bridge, use
#
# (network-script 'network-bridge bridge=<name>')
#
# It is possible to use the network-bridge script in more complicated
# scenarios, such as having two outgoing interfaces, with two bridges, and
# two fake interfaces per guest domain.  To do things like this, write
# yourself a wrapper script, and call network-bridge from it, as appropriate.
#
#(network-script network-bridge bridge=xenbr)
(network-script network-bridge2 bridge=xenbr)

# The script used to control virtual interfaces.  This can be overridden on a
# per-vif basis when creating a domain or a configuring a new vif.  The
# vif-bridge script is designed for use with the network-bridge script, or
# similar configurations.
#
# If you have overridden the bridge name using
# (network-script 'network-bridge bridge=<name>') then you may wish to do the
# same here.  The bridge name can also be set when creating a domain or
# configuring a new vif, but a value specified here would act as a default.
#
# If you are using only one bridge, the vif-bridge script will discover that,
# so there is no need to specify it explicitly.
#
(vif-script vif-bridge bridge=xenbr)


## Use the following if network traffic is routed, as an alternative to the
# settings for bridged networking given above.
#(network-script network-route)
#(vif-script     vif-route)


## Use the following if network traffic is routed with NAT, as an alternative
# to the settings for bridged networking given above.
#(network-script network-nat)
#(vif-script     vif-nat)

# dom0-min-mem is the lowest permissible memory level (in MB) for dom0.
# This is a minimum both for auto-ballooning (as enabled by
# enable-dom0-ballooning below) and for xm mem-set when applied to dom0.
(dom0-min-mem 196)

# Whether to enable auto-ballooning of dom0 to allow domUs to be created.
# If enable-dom0-ballooning = no, dom0 will never balloon out.
(enable-dom0-ballooning yes)

# In SMP system, dom0 will use dom0-cpus # of CPUS
# If dom0-cpus = 0, dom0 will take all cpus available
(dom0-cpus 0)

# Whether to enable core-dumps when domains crash.
#(enable-dump no)

# The tool used for initiating virtual TPM migration
#(external-migration-tool '')

# The interface for VNC servers to listen on. Defaults
# to 127.0.0.1  To restore old 'listen everywhere' behaviour
# set this to 0.0.0.0
(vnc-listen 'x.x.x.x')

# The default password for VNC console on HVM domain.
# Empty string is no authentication.
(vncpasswd 'XXXXXXX')

# The VNC server can be told to negotiate a TLS session
# to encryption all traffic, and provide x509 cert to
# clients enalbing them to verify server identity. The
# GTK-VNC widget, virt-viewer, virt-manager and VeNCrypt
# all support the VNC extension for TLS used in QEMU. The
# TightVNC/RealVNC/UltraVNC clients do not.
#
# To enable this create x509 certificates / keys in the
# directory /etc/xen/vnc
#
#  ca-cert.pem       - The CA certificate
#  server-cert.pem   - The Server certificate signed by the CA
#  server-key.pem    - The server private key
#
# and then uncomment this next line
# (vnc-tls 1)

# The certificate dir can be pointed elsewhere..
#
# (vnc-x509-cert-dir /etc/xen/vnc)

# The server can be told to request & validate an x509
# certificate from the client. Only clients with a cert
# signed by the trusted CA will be able to connect. This
# is more secure the password auth alone. Passwd auth can
# used at the same time if desired. To enable client cert
# checking uncomment this:
#
# (vnc-x509-verify 1)

# The default keymap to use for the VM's virtual keyboard
# when not specififed in VM's configuration
#(keymap 'en-us')

# Script to run when the label of a resource has changed.
#(resource-label-change-script '')

# Rotation count of qemu-dm log file.
#(qemu-dm-logrotate-count 10)

# Path where persistent domain configuration is stored.
# Default is /var/lib/xend/domains/
#(xend-domains-path /var/lib/xend/domains)
 # dom0-min-mem is the lowest permissible memory level (in MB) for dom0.
 # This is a minimum both for auto-ballooning (as enabled by
 # enable-dom0-ballooning below) and for xm mem-set when applied to dom0.
-(dom0-min-mem 196)
+(dom0-min-mem 900)

 # Whether to enable auto-ballooning of dom0 to allow domUs to be created.
 # If enable-dom0-ballooning = no, dom0 will never balloon out.
-(enable-dom0-ballooning yes)
+(enable-dom0-ballooning no)

 # In SMP system, dom0 will use dom0-cpus # of CPUS
 # If dom0-cpus = 0, dom0 will take all cpus available
(xend-relocation-server yes)
(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$')
(network-script network-bridge2 bridge=xenbr)
(vif-script vif-bridge bridge=xenbr)
(dom0-min-mem 196)
(enable-dom0-ballooning yes)
(dom0-cpus 0)
(vnc-listen 'x.x.x.x')
(vncpasswd 'XXXXXXX')
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] Xen locks down on specific server after 1-3 days, Silviu Paragina <=