WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] CPU hangs

To: Pasi Kärkkäinen <pasik@xxxxxx>
Subject: RE: [Xen-devel] CPU hangs
From: "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 9 Sep 2010 12:48:55 -0500
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 09 Sep 2010 10:49:58 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <EACA7CA90354A849B1315959042A052C26F4FC@xxxxxxxxxxxxxxxxxxxxx> <20100909161344.GM2804@xxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: ActQOgRiggjN8bEwTF+1Ia6NPedfXAADFVjc
Thread-topic: [Xen-devel] CPU hangs
In multicpu mode, it takes what appears to be  a random amount of time to hang the whole host.  So I make it happen faster by cutting down the # of CPUs to 1.  When I do this, I usually can get it to happen in < 1hr.  I believe a Windows HVM must be running but can't say that with 100% certainty at this time.  I dont believe the serial port prints in the stack  trace is what is hanging.  I added a serial port to be able to debug the problem.  I think the issue is with the shadow page table.  Of interest may be the fact that these messages are being printed as well
 
>    (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer detects
>    that CPU0 is stuck!

 
So my first inclination is to go research the area dealing with VRAM tracking.  It may be getting in a loop causing the crash
 
 
menuentry "Boot Entry 3: debug cpu1" {
    saved_entry=2
    save_env saved_entry
    set root=(NxVG-NxDisk1)
    multiboot   /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle  crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1
    module      /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug nmi_watchdog=1
    module      /initrd.img-2.6.32-orc
}


From: Pasi Kärkkäinen [mailto:pasik@xxxxxx]
Sent: Thu 9/9/2010 12:13 PM
To: Roger Cruz
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] CPU hangs

On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote:
>    I am experiencing host hangs with 3.4.2 so I turned on the watchdog and
>    finally got something useful to start tracking.  Before I do, I always
>    like to make sure that this is not something that has already been
>    reported and fixed.  Anyone know of any such CPU deadlocks and a fix?
>
>    Thanks
>

Please paste your grub.conf entry.
When does this hang happen? During startup, or during operation? After how much uptime?

ns16550 sounds like a serial port to me..

-- Pasi

>    (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer detects
>    that CPU0 is stuck!
>    (XEN) ----[ Xen-3.4.2  x86_64  debug=n  Tainted:    C ]----
>    (XEN) CPU:    0
>    (XEN) RIP:    e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
>    (XEN) RFLAGS: 0000000000000006   CONTEXT: hypervisor
>    (XEN) rax: 0000000000000000   rbx: ffff828c801ef260   rcx:
>    0000000000000001
>    (XEN) rdx: 0000000000002005   rsi: 0000000000000020   rdi:
>    ffff828c801ef260
>    (XEN) rbp: 0000000000000020   rsp: ffff828c8024faa0   r8:
>    0000000000004000
>    (XEN) r9:  0000000000003fff   r10: ffff828c80268360   r11:
>    0000000000000400
>    (XEN) r12: ffff828c801ef2dc   r13: 0000000000000020   r14:
>    ffff828c80267ecc
>    (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4:
>    00000000000026f0
>    (XEN) cr3: 00000000a17ea000   cr2: 0000000097a20000
>    (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>    (XEN) Xen stack trace from rsp=ffff828c8024faa0:
>    (XEN)    ffff828c80127776 ffff828c801ef260 0000000000000000
>    ffff828c801ef2dc
>    (XEN)    ffff828c80127e00 0000000800000000 0000000000000086
>    0000000000000400
>    (XEN)    ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40
>    00000000000f1161
>    (XEN)    0000000000000000 ffff8300b781c000 ffff828c80126019
>    0000000000000286
>    (XEN)    ffff828c8012662e 0000003000000030 ffff828c8024fc18
>    ffff828c8024fb48
>    (XEN)    ffff828c80267ea6 0000000000000000 ffff828c801e3b9c
>    0000000000000435
>    (XEN)    0000000000000002 00000000000f1161 000000000006018a
>    ffff8300b781c000
>    (XEN)    ffff8300b75da000 0000000400000000 ffff8180006022b0
>    0000000078e31023
>    (XEN)    0000000078e31021 0000000000078e31 ffff8180006022b0
>    ffff8180006022b0
>    (XEN)    ffff828c801b4870 0000000000000000 ffff828400c03160
>    0000000000000000
>    (XEN)    ffff8300a08a4b08 0000000000000000 000000006018a023
>    ffff8300a08a4b08
>    (XEN)    0000000000000000 000000006018a023 ffff828c801b4839
>    ffff8300b75da000
>    (XEN)    ffff828c00000001 ffffffffffffffff 000000000006018a
>    0000000000000000
>    (XEN)    00000001801b7221 00000000a08a4b08 00000000000a08a4
>    0000000078e32061
>    (XEN)    ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8
>    ffff8300b7801b08
>    (XEN)    ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306
>    ffff828c80228740
>    (XEN)    ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8
>    ffff828c8024ff28
>    (XEN)    ffff828c8024fce4 0000000000000000 0000000000000000
>    0000000000000000
>    (XEN)    0000000100000100 ffff828400f1c640 0000000000f1c640
>    0000000000078e32
>    (XEN)    ffff8300b75da000 ffff828400f1c640 00000000000b7801
>    0000000000000000
>    (XEN) Xen call trace:
>    (XEN)    [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
>    (XEN)    [<ffff828c80127776>] __serial_putc+0x86/0x180
>    (XEN)    [<ffff828c80127e00>] serial_puts+0x90/0x120
>    (XEN)    [<ffff828c80126019>] __putstr+0x9/0xa0
>    (XEN)    [<ffff828c8012662e>] printk+0xee/0x1d0
>    (XEN)    [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0
>    (XEN)    [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0
>    (XEN)    [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0
>    (XEN)    [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0
>    (XEN)    [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450
>    (XEN)    [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550
>    (XEN)    [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390
>    (XEN)    [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0
>    (XEN)    [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0
>    (XEN)    [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0
>    (XEN)    [<ffff828c80127500>] ns16550_poll+0x0/0xa0
>    (XEN)    [<ffff828c80138f62>] reprogram_timer+0x62/0xa0
>    (XEN)    [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110
>    (XEN)    [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60
>    (XEN)    [<ffff828c80198715>] vmx_intr_assist+0x55/0x190
>    (XEN)    [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 0:
>    (XEN) FATAL TRAP: vector = 2 (nmi)
>    (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
>    (XEN) ****************************************
>    (XEN)

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10 02:34:00

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>