|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] CPU hangs
On Thu, Sep 09, 2010 at 12:48:55PM -0500, Roger Cruz wrote:
> In multicpu mode, it takes what appears to be a random amount of time to
> hang the whole host. So I make it happen faster by cutting down the # of
> CPUs to 1. When I do this, I usually can get it to happen in < 1hr. I
> believe a Windows HVM must be running but can't say that with 100%
> certainty at this time. I dont believe the serial port prints in the
> stack trace is what is hanging. I added a serial port to be able to
> debug the problem. I think the issue is with the shadow page table. Of
> interest may be the fact that these messages are being printed as well
>
> > (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer
> detects
> > that CPU0 is stuck!
>
> So my first inclination is to go research the area dealing with VRAM
> tracking. It may be getting in a loop causing the crash
>
>
> menuentry "Boot Entry 3: debug cpu1" {
> saved_entry=2
> save_env saved_entry
> set root=(NxVG-NxDisk1)
> multiboot /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle
> [1]crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog
> com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1
> module /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro
> console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug
> nmi_watchdog=1
> module /initrd.img-2.6.32-orc
> }
>
Have you tried changing the cpufreq/cpuidle settings?
How about the watchdog?
Also if you're using Xen 3.4.2 I believe you'll lose the dom0_mem=1024M
parameter
due to the grub2 bug.. so make sure to add dummy=dummy parameter before the
dom0_mem.
-- Pasi
> --------------------------------------------------------------------------
>
> From: Pasi Kärkkäinen [mailto:pasik@xxxxxx]
> Sent: Thu 9/9/2010 12:13 PM
> To: Roger Cruz
> Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] CPU hangs
>
> On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote:
> > I am experiencing host hangs with 3.4.2 so I turned on the watchdog
> and
> > finally got something useful to start tracking. Before I do, I
> always
> > like to make sure that this is not something that has already been
> > reported and fixed. Anyone know of any such CPU deadlocks and a fix?
> >
> > Thanks
> >
>
> Please paste your grub.conf entry.
> When does this hang happen? During startup, or during operation? After how
> much uptime?
>
> ns16550 sounds like a serial port to me..
>
> -- Pasi
>
> > (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer
> detects
> > that CPU0 is stuck!
> > (XEN) ----[ Xen-3.4.2 x86_64 debug=n Tainted: C ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
> > (XEN) RFLAGS: 0000000000000006 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff828c801ef260 rcx:
> > 0000000000000001
> > (XEN) rdx: 0000000000002005 rsi: 0000000000000020 rdi:
> > ffff828c801ef260
> > (XEN) rbp: 0000000000000020 rsp: ffff828c8024faa0 r8:
> > 0000000000004000
> > (XEN) r9: 0000000000003fff r10: ffff828c80268360 r11:
> > 0000000000000400
> > (XEN) r12: ffff828c801ef2dc r13: 0000000000000020 r14:
> > ffff828c80267ecc
> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4:
> > 00000000000026f0
> > (XEN) cr3: 00000000a17ea000 cr2: 0000000097a20000
> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> > (XEN) Xen stack trace from rsp=ffff828c8024faa0:
> > (XEN) ffff828c80127776 ffff828c801ef260 0000000000000000
> > ffff828c801ef2dc
> > (XEN) ffff828c80127e00 0000000800000000 0000000000000086
> > 0000000000000400
> > (XEN) ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40
> > 00000000000f1161
> > (XEN) 0000000000000000 ffff8300b781c000 ffff828c80126019
> > 0000000000000286
> > (XEN) ffff828c8012662e 0000003000000030 ffff828c8024fc18
> > ffff828c8024fb48
> > (XEN) ffff828c80267ea6 0000000000000000 ffff828c801e3b9c
> > 0000000000000435
> > (XEN) 0000000000000002 00000000000f1161 000000000006018a
> > ffff8300b781c000
> > (XEN) ffff8300b75da000 0000000400000000 ffff8180006022b0
> > 0000000078e31023
> > (XEN) 0000000078e31021 0000000000078e31 ffff8180006022b0
> > ffff8180006022b0
> > (XEN) ffff828c801b4870 0000000000000000 ffff828400c03160
> > 0000000000000000
> > (XEN) ffff8300a08a4b08 0000000000000000 000000006018a023
> > ffff8300a08a4b08
> > (XEN) 0000000000000000 000000006018a023 ffff828c801b4839
> > ffff8300b75da000
> > (XEN) ffff828c00000001 ffffffffffffffff 000000000006018a
> > 0000000000000000
> > (XEN) 00000001801b7221 00000000a08a4b08 00000000000a08a4
> > 0000000078e32061
> > (XEN) ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8
> > ffff8300b7801b08
> > (XEN) ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306
> > ffff828c80228740
> > (XEN) ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8
> > ffff828c8024ff28
> > (XEN) ffff828c8024fce4 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN) 0000000100000100 ffff828400f1c640 0000000000f1c640
> > 0000000000078e32
> > (XEN) ffff8300b75da000 ffff828400f1c640 00000000000b7801
> > 0000000000000000
> > (XEN) Xen call trace:
> > (XEN) [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
> > (XEN) [<ffff828c80127776>] __serial_putc+0x86/0x180
> > (XEN) [<ffff828c80127e00>] serial_puts+0x90/0x120
> > (XEN) [<ffff828c80126019>] __putstr+0x9/0xa0
> > (XEN) [<ffff828c8012662e>] printk+0xee/0x1d0
> > (XEN) [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0
> > (XEN) [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0
> > (XEN) [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0
> > (XEN) [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0
> > (XEN) [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450
> > (XEN) [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550
> > (XEN) [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390
> > (XEN) [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0
> > (XEN) [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0
> > (XEN) [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0
> > (XEN) [<ffff828c80127500>] ns16550_poll+0x0/0xa0
> > (XEN) [<ffff828c80138f62>] reprogram_timer+0x62/0xa0
> > (XEN) [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110
> > (XEN) [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60
> > (XEN) [<ffff828c80198715>] vmx_intr_assist+0x55/0x190
> > (XEN) [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) FATAL TRAP: vector = 2 (nmi)
> > (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
> > (XEN) ****************************************
> > (XEN)
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > [2]http://lists.xensource.com/xen-devel
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10
> 02:34:00
>
> References
>
> Visible links
> 1. mailto:crashkernel=128M@16m
> 2. http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|