WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] XEN server stalling .. problem spotted - solution required

To: xen-users <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-users] XEN server stalling .. problem spotted - solution required
From: Gareth Bult <gareth@xxxxxxxxxxxxx>
Date: Wed, 9 Jan 2008 18:20:30 +0000 (GMT)
Delivery-date: Wed, 09 Jan 2008 10:21:17 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Ok, I've been chasing this for many days .. I have a server running 10 instances that periodically freezes .. then sometimes "comes back."

I tried many things to try to spot the problem and finally found it by accident.
It's a little frustrating as typically the Dom0 and One (or two) instances "go" and the rest carry on .. and there is diddley squat when it comes to logging information or error messages.

I'm now using 'watch "cat /proc/meminfo"' in the Dom0.
I watch the Dirty figure increase, and occasionally decrease.

In an instance (this is just an easy way to reproduce it quickly) do;
dd if=/dev/zero of=/tmp/bigfile bs=1M count=1000

Watch the "dirty" rise and at some point you'll see "writeback" cut in.
All looks good.

Give it a few seconds and your "watch" of /proc/meminfo will freeze.
On my system "Dirty" will at this point be reading about "500M" and "writeback" will have gone down to zero.
"xm list" in another session will confirm that you have a major problem. (it will hang)

For some reason PDFLUSH is not working properly !!!
On another shell "sync" and the machine instantly jumps back to life!

I'm running a stock Ubuntu XEN 3.1 kernel.
File back XEN instances, typically 5Gb with 1Gb swap.
Dual / Dual Core 2.8G Xeon (4 in total) with 6Gb RAM.
Twin 500Gb SATA HDD (software RAID1)

To my way of thinking (!) when it runs out of memory, it should force a sync (or similar) and it's not, it's just sitting there. If I wait for the dirty_expire_centisecs timer to expire, I may get some life back, some instances will survive and some will have hung.

Here's a working "meminfo";

MemTotal:       860160 kB
MemFree:         22340 kB
Buffers:         49372 kB
Cached:         498416 kB
SwapCached:      15096 kB
Active:          92452 kB
Inactive:       491840 kB
SwapTotal:     4194288 kB
SwapFree:      4136916 kB
Dirty:            3684 kB
Writeback:           0 kB
AnonPages:       29104 kB
Mapped:          13840 kB
Slab:            45088 kB
SReclaimable:    25304 kB
SUnreclaim:      19784 kB
PageTables:       2440 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4624368 kB
Committed_AS:   362012 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      3144 kB
VmallocChunk: 34359735183 kB

Here's one where "xm list" hangs, but my "watch" is still updating the /proc/meminfo display;

MemTotal:       860160 kB
MemFree:         13756 kB
Buffers:         53656 kB
Cached:         502420 kB
SwapCached:      14812 kB
Active:          84356 kB
Inactive:       507624 kB
SwapTotal:     4194288 kB
SwapFree:      4136900 kB
Dirty:          213096 kB
Writeback:           0 kB
AnonPages:       28832 kB
Mapped:          13924 kB
Slab:            45988 kB
SReclaimable:    25728 kB
SUnreclaim:      20260 kB
PageTables:       2456 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4624368 kB
Committed_AS:   361796 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      3144 kB
VmallocChunk: 34359735183 kB

Here's a frozen one;

MemTotal:       860160 kB
MemFree:         15840 kB
Buffers:          2208 kB
Cached:         533048 kB
SwapCached:       7956 kB
Active:          49992 kB
Inactive:       519916 kB
SwapTotal:     4194288 kB
SwapFree:      4136916 kB
Dirty:          505112 kB
Writeback:        3456 kB
AnonPages:       34676 kB
Mapped:          14436 kB
Slab:            64508 kB
SReclaimable:    18624 kB
SUnreclaim:      45884 kB
PageTables:       2588 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4624368 kB
Committed_AS:   368064 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      3144 kB
VmallocChunk: 34359735183 kB

Help!!!

Gareth.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] XEN server stalling .. problem spotted - solution required, Gareth Bult <=