[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Troubles with 2.6-booting and xend


I did run Xen-2.0 on Dual AMD Palomino and it worked fine for 17 days,
then it started crashing once the day. Then i upgraded it to 2.0.1 but
problem didn't go away. Server is still working fine, just doesn't like
to stay up with Xen. So i move ahead, and i bought three more servers,
one-by-one, first one was catastrophe due 1,5GB of 2GB memory was
already fscked up, so we may let this out of counting.
But then i enforced second server, Dual Intel Pentium III 733MHz, and
it was known to very stable production machine, and it still is.
I reinstalled everything to this server because b0rk3n memory in the
previous one may mess everything so fresh and clean build did sound
better. I took xen-2.0.1 sources and then build everything it needs
using Debian Sarge, and everything went fine. I did some minor changes
to linux-2.6.9-xen0 kernel-configuration because i needed software-
RAID-support and some device-drivers, and some drivers such pcmcia etc.
i removed from the configuration. Rebuild went fine and installing also.
But then troubles come true. I rebooted (remotely) the server and it
didn't boot up. Next day i go see from the console what was wrong, and
it just last screamed that Xen cannot allocate console ttyS or smth,
then reboot.
(Actually i think the problem was similar to that IBM Xserver-case
Xeon which reboots while booting.
I was selected all kernels to build when i built the system, so i did
change 2.6.9 to 2.4.27 and whoops, it worked. This machine is quite
small one, actually no other than linux-2.4 was needed so i left it
alone. It's still up and running 5 virtual domains.
I used same onfiguration for both, 2.4.27-xen0 and 2.6.9-xen0 kernels.
(Yes, i configured them both separately.)
Then i was already purchased third server to replace that one with most
of memory was broken, and this one is just standard Intel Pentium 4 HT,
3GHz, 2GB memory and Intel i865G-chipset. Integrated gigabit ethernet,
3com 10/100 NIC added, and two 120GB SATA-drives mirrored using linux
kernel built-in software raid-support.
This one worked fine also when i installed Debian Sarge, then fetched
Xen-source and build everything.
But when build was done like the Dual P3 above, i rebooted and same
happened. Today i went to see what was wrong, and Xen whined against
console and ttyS, it went very fast, and booted a second later.
There the situation was more complicated because this machine uses
SATA-drives and there weren't suitable SATA-driver for Intel ICH5-
chipset in 2.4.27-kernel so i was unable to compile 2.4 kernel with
same features than 2.6 has supported, so i booted standard 2.6.9 w/o
Xen and worked out another plan. I upgraded to xen-testing, and this
used 2.6.10 kernel instead of 2.6.9, and like i expected, it did work.
I successfully booted that machine with 2.6.10-xen0 but then i saw
that it still whined against that console.. So it was just some random
error, not the reason for the rebooting 2.6.9.
But now i'm having the biggest problem so far..
When i start xend, it starts like it did before, but network stops
responding, even ping.
I had this problem with the first replaced server also (the one with
broken memories). I was believing that the reason for the bug was in
the memories, but this machine does the same, cannot be.
So, i just start xend, and nothing happens (to network).
Machine itself keeps running, local console does work and virtual
domains can be started but network is simply dead.
I tried to stop it from the console earlier today, but network didn't
start working until i rebooted whole machine.
Then i did cronjob that does reboot the machine every 15 minutes, and
it did, but when i then arrived to home and started tuning with that
problem, i fscked up. I just changed /etc/xen/xend-config.sxp a little
xend-address to 'localhost', then;
dolphin:/etc/xen# /etc/init.d/xend start
dolphin:/etc/xen# Read from remote host dolphin: Connection timed out
Connection to dolphin closed.
game over.
No response. I'm sure it rebooted, because there were two NICs,
10/100 is using public IP and it stopped responding immediately.
SSH connection timeout happened several minutes after.
But gigabit ethernet is connected to private lan, and it did echo to
ping, all tcp-traffic was dead, i was unable to connect using SSH
from another server using that LAN-interface etc.
When the cron-reboot event come true, LAN-interface stopped responding
also. And it did never come up. But i'm pretty sure it did the boot.
(i tested it when i was at the servervault).
I cannot give you any logs or configurations atm due the server is
unreachable, but i will if they help anything to solve this problem.

So it seems there is something unexpected fsckup with 2.6-kernels and
perhaps with xend. I've not yet dig out what xend does when started.

But i've spent 2 weeks alread (except Xmas ;>) with this problem and
i really don't know what is wrong. All suggestions are welcome.

Thank You.

Sami Louko <proton@xxxxxxxxx>

SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.