WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Domain Crash and Xend can't restart

To: Xen-users <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-users] Domain Crash and Xend can't restart
From: Mike Lemoine <mlemoine@xxxxxxxxxxx>
Date: Thu, 26 Oct 2006 10:15:59 -0600
Delivery-date: Thu, 26 Oct 2006 09:16:45 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acb5GgXlRDtiEWUNEduPuQANk7RLFg==
Thread-topic: Domain Crash and Xend can't restart
User-agent: Microsoft-Entourage/11.2.5.060620
I have a single VM (of 11) that has a recurring problem.  This image has
moved from machine to machine, with the problem following it.  This image
has been rebuilt from scratch, and the problem recurred.  It would appear
that there is something in the behaviour of this VM which causes it to crash
and causes Xend to become unhappy.

The problem presents as:

Domain crashes, becomes zombie.
xm destroy will not destroy the zombie.
xm create will not start it or any other domain (Hotplug Scripts not
working)
The only solution appears to be a reboot of the host machine.  Stopping and
restarting xend/xendomains does not solve the problem.

This particular VM is our continuous build system.  It is building code
pretty much all day long, and does very heavy NFS ops.

The host machine is using Xen 3.0.2 running on FC5 2.6.17-1.2174_FC5xen0
(using the yum packages).  The guest OS is FC4 with 2.6.17-1.2174_FC5xenU

The problem only presents itself on this VM.  It is actually an identical
copy to the other 11 VMs, all of which are development boxes using NFS.  The
issue appears to occur only due to the volume of work the problem image
does.

One of the bits of help I need is in knowing where to get the information
necessary to solve the problem.

I've attached the bit of the xend.log that involves the crash and subsequent
failed restarts.

# xm info  
host                   : pdev0
release                : 2.6.17-1.2174_FC5xen0
version                : #1 SMP Tue Aug 8 16:26:11 EDT 2006
machine                : x86_64
nr_cpus                : 2
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 1
threads_per_core       : 1
cpu_mhz                : 2390
hw_caps                :
00000000:00000000:078bfbff:e3d3fbff:00000000:00000010:00000001
total_memory           : 8128
free_memory            : 1413
xen_major              : 3
xen_minor              : 0
xen_extra              : -unstable
xen_caps               : xen-3.0-x86_64
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)
cc_compile_by          : brewbuilder
cc_compile_domain      : build.redhat.com
cc_compile_date        : Tue Aug  8 15:25:03 EDT 2006

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 250
stepping        : 1
cpu MHz         : 2390.648
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni
lahf_lm
bogomips        : 5978.35
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 250
stepping        : 1
cpu MHz         : 2390.648
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni
lahf_lm
bogomips        : 5978.35
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


(The machine has been bounced since I had to get the image back in service,
I don't know how useful this will be)

# xm dmesg
 __  __            _____  ___                     _        _     _
 \ \/ /___ _ __   |___ / / _ \    _   _ _ __  ___| |_ __ _| |__ | | ___
  \  // _ \ '_ \    |_ \| | | |__| | | | '_ \/ __| __/ _` | '_ \| |/ _ \
  /  \  __/ | | |  ___) | |_| |__| |_| | | | \__ \ || (_| | |_) | |  __/
 /_/\_\___|_| |_| |____(_)___/    \__,_|_| |_|___/\__\__,_|_.__/|_|\___|
                   
 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 3.0-unstable (brewbuilder@xxxxxxxxxxxxxxxx) (gcc version 4.1.1
20060525 (Red Hat 4.1.1-1)) Tue Aug  8 15:25:03 EDT 2006
 Latest ChangeSet: unavailable

(XEN) Command line: /boot/xen.gz-2.6.17-1.2174_FC5
(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009a000 (usable)
(XEN)  000000000009a000 - 00000000000a0000 (reserved)
(XEN)  00000000000d0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000fbf70000 (usable)
(XEN)  00000000fbf70000 - 00000000fbf77000 (ACPI data)
(XEN)  00000000fbf77000 - 00000000fbf80000 (ACPI NVS)
(XEN)  00000000fbf80000 - 00000000fc000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec00400 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000fff80000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000200000000 (usable)
(XEN) System RAM: 8127MB (8322088kB)
(XEN) Xen heap: 13MB (14020kB)
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) found SMP MP-table at 000f7de0
(XEN) DMI present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v002 PTLTD                                 ) @
0x00000000000f7db0
(XEN) ACPI: XSDT (v001 PTLTD     XSDT   0x06040000  LTP 0x00000000) @
0x00000000fbf74bd4
(XEN) ACPI: FADT (v003 SUN    V20z     0x06040000 PTEC 0x000f4240) @
0x00000000fbf76c0c
(XEN) ACPI: HPET (v001 Sun    V20z     0x06040000 PTEC 0x00000000) @
0x00000000fbf76d00
(XEN) ACPI: MADT (v001 PTLTD     APIC   0x06040000  LTP 0x00000000) @
0x00000000fbf76d38
(XEN) ACPI: SPCR (v001 PTLTD  $UCRTBL$ 0x06040000 PTL  0x00000001) @
0x00000000fbf76dae
(XEN) ACPI: SSDT (v001 SUN    V20z     0x06040000  LTP 0x00000001) @
0x00000000fbf76dfe
(XEN) ACPI: SSDT (v001 SUN    V20z     0x06040000  LTP 0x00000001) @
0x00000000fbf76e9b
(XEN) ACPI: SRAT (v001 SUN    V20z     0x06040000 SUN  0x00000001) @
0x00000000fbf76f38
(XEN) ACPI: DSDT (v001   Sun      V20z 0x06040000 MSFT 0x0100000e) @
0x0000000000000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 15:5 APIC version 16
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 15:5 APIC version 16
(XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x03] address[0xfd000000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 3, version 17, address 0xfd000000, GSI 24-27
(XEN) ACPI: IOAPIC (id[0x04] address[0xfd001000] gsi_base[28])
(XEN) IOAPIC[2]: apic_id 4, version 17, address 0xfd001000, GSI 28-31
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 3 I/O APICs
(XEN) ACPI: HPET id: 0x102282a0 base: 0xfed00000
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Initializing CPU#0
(XEN) Detected 2390.648 MHz processor.
(XEN) CPU0: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: AMD Opteron(tm) Processor 250 stepping 01
(XEN) Booting processor 1/1 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU1: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) AMD: Disabling C1 Clock Ramping Node #0
(XEN) AMD: Disabling C1 Clock Ramping Node #1
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: AMD Opteron(tm) Processor 250 stepping 01
(XEN) Total of 2 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=0 pin2=0
(XEN) checking TSC synchronization across 2 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 2 CPUs
(XEN) Machine check exception polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000000f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   000000000e000000->0000000010000000 (2010971 pages to
be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff80200000->ffffffff80619108
(XEN)  Init. ramdisk: ffffffff8061a000->ffffffff808db000
(XEN)  Phys-Mach map: ffffffff808db000->ffffffff81842ad8
(XEN)  Start info:    ffffffff81843000->ffffffff81844000
(XEN)  Page tables:   ffffffff81844000->ffffffff81855000
(XEN)  Boot stack:    ffffffff81855000->ffffffff81856000
(XEN)  TOTAL:         ffffffff80000000->ffffffff81c00000
(XEN)  ENTRY ADDRESS: ffffffff80200000
(XEN) Dom0 has maximum 2 VCPUs
(XEN) Initrd len 0x2c1000, start at 0xffffffff8061a000
(XEN) Scrubbing Free RAM:
............................................................................
......done.
(XEN) Xen trace buffers: disabled
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
Xen).

---

Any help would be apprectiated.  

Attachment: xend.log
Description: Binary data

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>