[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression, host crash with 4.5rc1



On 11/17/2014 23:54, Jan Beulich wrote:
On 17.11.14 at 20:21, <sflist@xxxxxxxxx> wrote:
Okay, I did a bisection and was not able to correlate the above error
message with the problem I'm seeing. Not saying it's not related, but I
had plenty of successful test runs in the presence of that error.

Took me about a week (sometimes it takes as much as 6 hours to produce
the error), but bisect narrowed it down to this commit:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9a727a813e9b25003e433b3d
c3fa47e621f9e238

What do you think?
Thanks for narrowing this, even if this change didn't show any other
bad effects so far (and it's been widely tested by now), and even if
problems here would generally be expected to surface independent
of the use of PCI pass-through. But a hang (rather than a crash)
would indeed be the most natural result of something being wrong
here. To double check the result, could you, in an up-to-date tree,
simply make x86's arch_skip_send_event_check() return 0
unconditionally?

Made this change and the host was happy.

  Plus, without said adjustment, first just disable the
MWAIT CPU idle driver ("mwait-idle=0") and then, if that didn't make
a difference, use of C states altogether ("cpuidle=0"). If any of this
does make a difference, limiting use of C states without fully
excluding their use may need to be the next step.

Will do this next.

Another thing - now that serial logging appears to be working for
you, did you try whether the host, once hung, still reacts to serial
input (perhaps force input to go to Xen right at boot via the
"conswitch=" option)? If so, 'd' debug-key output would likely be
the piece of most interest.

Here you go. Performed with a checkout of 9a727a81 (because it was handy), let me know if you'd rather see the results from 4.5-rc2 or any other Xen debugging info:

(XEN) 'd' pressed -> dumping registers
(XEN)
(XEN) *** Dumping CPU0 guest state (d1v2): ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    0010:[<fffff8000281e2c1>]
(XEN) RFLAGS: 0000000000000002   CONTEXT: hvm guest
(XEN) rax: 00003acd4939f3e7   rbx: 00003acd493a0cce   rcx: 000000000000ffff
(XEN) rdx: 00003acd00000000   rsi: 0000000000000000   rdi: 0000000000000057
(XEN) rbp: 000000000000645c   rsp: fffff880033edf90   r8: fffff880033edff0
(XEN) r9:  0000000000000000   r10: fffff880033ee040   r11: 0000000342934690
(XEN) r12: fffff880033ee3c8   r13: 0000000000001000   r14: 0000000000000000
(XEN) r15: 0000000000000058   cr0: 0000000080050031   cr4: 00000000000006f8
(XEN) cr3: 0000000066aca000   cr2: fffff98002680000
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU1 host state: ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d08012a9a1>] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff8300a943e000   rcx: 0000000000000001
(XEN) rdx: ffff830c3dc70000   rsi: 0000000000000004   rdi: ffff830c3dc7a088
(XEN) rbp: ffff830c3dc77ec8   rsp: ffff830c3dc77e40   r8: ffff830c3dc7a0a0
(XEN) r9:  0000000000000000   r10: fffff88002fd82a0   r11: fffff88002fe2d70
(XEN) r12: 0000151cc8b48756   r13: ffff8300a943e000   r14: ffff830c3dc7a088
(XEN) r15: 0000000001c9c380   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000c18962000   cr2: 00000000ff331aa0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff830c3dc77e40:
(XEN)    ffff82d080126ec5 ffff82d080321280 ffff830c3dc7a0a0 0000000100c77e78
(XEN)    ffff830c3dc7a080 ffff82d0801b5277 ffff8300a943e000 fffff88002fe2d70
(XEN)    ffff8300a943e000 0000000001c9c380 ffff82d0801e0f00 ffff830c3dc77f08
(XEN)    ffff82d0802f8080 ffff82d0802f8000 ffffffffffffffff ffff830c3dc70000
(XEN)    0000000000000001 ffff830c3dc77ef8 ffff82d08012a1b3 ffff8300a943e000
(XEN)    fffff88002fe2d70 000036d08fbeebe8 000000000000000f ffff830c3dc77f08
(XEN)    ffff82d08012a20b 000000000000000f ffff82d0801e3d2a 0000000000000001
(XEN)    000000000000000f 000036d08fbeebe8 fffff88002fe2d70 000000000000000f
(XEN)    fffff88002fd8180 fffff88002fe2d70 fffff88002fd82a0 000034711df61755
(XEN)    fffff88002fd82a0 0000000000000002 fffff88002fd81c0 0000000000000400
(XEN)    0000000000000000 fffff88002fe2eb0 0000beef0000beef fffff8000298520c
(XEN)    000000bf0000beef 0000000000000046 fffff88002fe2c20 000000000000beef
(XEN)    c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef
(XEN)    c2c2c2c200000001 ffff8300a943e000 0000003bbd958e00 c2c2c2c2c2c2c2c2
(XEN) Xen call trace:
(XEN)    [<ffff82d08012a9a1>] _spin_unlock_irq+0x30/0x31
(XEN)    [<ffff82d08012a1b3>] __do_softirq+0x81/0x8c
(XEN)    [<ffff82d08012a20b>] do_softirq+0x13/0x15
(XEN)    [<ffff82d0801e3d2a>] vmx_asm_do_vmentry+0x2a/0x45
(XEN)
(XEN) *** Dumping CPU1 guest state (d1v5): ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    0010:[<fffff8000298520c>]
(XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
(XEN) rax: 0000000000000002   rbx: fffff88002fd8180   rcx: fffff88002fd81c0
(XEN) rdx: 0000000000000400   rsi: 0000000000000000   rdi: fffff88002fe2eb0
(XEN) rbp: 000000000000000f   rsp: fffff88002fe2c20   r8: fffff88002fd82a0
(XEN) r9:  000034711df61755   r10: fffff88002fd82a0   r11: fffff88002fe2d70
(XEN) r12: fffff88002fe2d70   r13: 000036d08fbeebe8   r14: 000000000000000f
(XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
(XEN) cr3: 0000000000187000   cr2: 00000000ff331aa0
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0000   cs: 0010
(XEN)
(XEN) *** Dumping CPU2 guest state (d1v4): ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    0010:[<fffff8000298520e>]
(XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
(XEN) rax: 0000000000000002   rbx: fffff88002fa2180   rcx: fffff88002fa21c0
(XEN) rdx: 0000000000000400   rsi: 0000000000000000   rdi: fffff88002faceb0
(XEN) rbp: 000000000000000f   rsp: fffff88002facc20   r8: fffff88002fa22a0
(XEN) r9:  000034edd4ec417c   r10: fffff88002fa22a0   r11: fffff88002facd70
(XEN) r12: fffff88002facd70   r13: 000036d08fe55d56   r14: 000000000000000f
(XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
(XEN) cr3: 0000000000187000   cr2: 00000000776ebfb8
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0000   cs: 0010
(XEN)
(XEN) *** Dumping CPU3 host state: ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82d0801d4bb4>] enable_intr_window+0xe4/0xed
(XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor
(XEN) rax: 00000000b6a065fe   rbx: 000000000000d202   rcx: 0000000000000000
(XEN) rdx: 0000000000000004   rsi: 000000000000d202   rdi: ffff8300a942f000
(XEN) rbp: ffff830c3dca7e98   rsp: ffff830c3dca7e68   r8: ffff8300a942f000
(XEN) r9:  00000000ffffffff   r10: ffff830c3f68d650   r11: fffff80000ba6d30
(XEN) r12: ffff8300a942f000   r13: 000000000000d202   r14: 00000000000000d2
(XEN) r15: 0000000001c9c380   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000c1b519000   cr2: 0000000004af6354
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff830c3dca7e68:
(XEN)    ffff830c3dca7e98 ffff82d0801bdff8 ffff830c3dca7e98 ffff8300a942f000
(XEN)    ffff8300a942f000 000000000000d202 ffff830c3dca7f08 ffff82d0801d4ea4
(XEN)    ffff82d0802f8000 000000d2ffffffff ffff830c3dca0000 0000000000000001
(XEN)    ffff830c3dca7ef8 ffff82d08012a1b3 ffff8300a942f000 ffff8300a942f000
(XEN)    0000151cdcad2c52 ffff8300a942f000 ffff830c3dcaa088 0000000001c9c380
(XEN)    ffff830c3dca7e28 ffff82d0801e3c86 0000000000000001 000000000000000f
(XEN)    000036d08fbabfb5 fffff80000ba6d30 000000000000000f fffff80002a47e80
(XEN)    fffff80000ba6d30 fffff80002a47fa0 0000362abb330cfb fffff80002a47fa0
(XEN)    0000000000000002 fffff80002a47ec0 0000000000000400 0000000000000000
(XEN)    fffff80000ba6e70 0000beef0000beef fffff8000298520c 000000bf0000beef
(XEN)    0000000000000046 fffff80000ba6be0 000000000000beef 000000000000beef
(XEN)    000000000000beef 000000000000beef 000000000000beef 0000000000000003
(XEN)    ffff8300a942f000 0000003bbd988e00 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0801d4bb4>] enable_intr_window+0xe4/0xed
(XEN)    [<ffff82d0801d4ea4>] vmx_intr_assist+0x28c/0x51c
(XEN)    [<ffff82d0801e3c86>] vmx_asm_vmexit_handler+0x46/0xc0
(XEN)
(XEN) *** Dumping CPU3 guest state (d1v0): ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    0010:[<fffff8000298520c>]
(XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
(XEN) rax: 0000000000000002   rbx: fffff80002a47e80   rcx: fffff80002a47ec0
(XEN) rdx: 0000000000000400   rsi: 0000000000000000   rdi: fffff80000ba6e70
(XEN) rbp: 000000000000000f   rsp: fffff80000ba6be0   r8: fffff80002a47fa0
(XEN) r9:  0000362abb330cfb   r10: fffff80002a47fa0   r11: fffff80000ba6d30
(XEN) r12: fffff80000ba6d30   r13: 000036d08fbabfb5   r14: 000000000000000f
(XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
(XEN) cr3: 0000000000187000   cr2: 0000000004af6354
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU4 host state: ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    4
(XEN) RIP:    e008:[<ffff82d08012a9a1>] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82d080321280   rcx: 0000000000000004
(XEN) rdx: ffff830c3dc90000   rsi: ffff8300a942e000   rdi: ffff830c3dc9c088
(XEN) rbp: ffff830c3dc97d58   rsp: ffff830c3dc97d10   r8: fffff88002e402a0
(XEN) r9:  000032242fda2dba   r10: fffff88002e402a0   r11: fffff88002e4ad70
(XEN) r12: ffff8300a942e000   r13: ffff830c3dc9c088   r14: ffff82d080321280
(XEN) r15: ffff830c3dc9c088   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000c1b505000   cr2: fffff8a00033e03e
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff830c3dc97d10:
(XEN)    ffff82d080128f45 0000000000000006 ffff830c3dc97d38 ffff82d08012aa17
(XEN)    0000000000000000 ffff8300a942e000 0000000000000028 0000000000000000
(XEN)    0000000000000000 ffff830c3dc97d88 ffff82d080129079 0000000000000246
(XEN)    ffff830c3dc97d88 ffff82d08012a80f ffff830c3dc97f18 ffff830c3dc97f08
(XEN)    ffff82d0801dda32 ffff8300a942e000 ffff8300a942e000 ffff8300a942e000
(XEN)    ffff830c3dc9c088 ffff830c3dc97e08 0000000000000000 ffff830c3dc97de8
(XEN)    ffff830c3dc90000 ffff830c3dc9c0a0 ffff8300a942e000 0000151cebb9d23b
(XEN)    ffff8300a942e000 ffff830c3dc9c088 0000000001c9c380 0000000000000292
(XEN)    ffff830c3dc97e28 ffff82d08012a80f ffff8300a942e000 ffff830c3dc97e98
(XEN)    ffff82d0801caea5 ffff830c3dc97ec8 ffff8300a942e508 ffff830c3dc97e58
(XEN)    ffff82d0801c89a3 ffff830c3dc97e78 ffff830c3dc97e88 ffff82d0801b5277
(XEN)    ffff8300a942e000 0000151cebb9d23b ffff8300a942e000 ffff830c3dc97f08
(XEN)    ffff82d0801e0f69 ffff830c3dc97f18 ffff8300a942e000 ffff830c3dc97f08
(XEN)    ffff82d0801ddba6 ffff830c3dc90000 0000000000000001 ffff830c3dc97ef8
(XEN)    ffff82d08012a1b3 ffff8300a942e000 ffff8300a942e000 fffff88002e4ad70
(XEN)    000036d08fbab094 000000000000000f 0000000000000001 000000000000000f
(XEN)    ffff82d0801e3c81 0000000000000001 000000000000000f 000036d08fbab094
(XEN)    fffff88002e4ad70 000000000000000f fffff88002e40180 fffff88002e4ad70
(XEN)    fffff88002e402a0 000032242fda2dba fffff88002e402a0 0000000000000002
(XEN)    fffff88002e401c0 0000000000000400 0000000000000000 fffff88002e4aeb0
(XEN) Xen call trace:
(XEN)    [<ffff82d08012a9a1>] _spin_unlock_irq+0x30/0x31
(XEN)    [<ffff82d080129079>] do_sched_op_compat+0x26/0xa1
(XEN)    [<ffff82d0801dda32>] vmx_vmexit_handler+0x1845/0x195e
(XEN)    [<ffff82d0801e3c81>] vmx_asm_vmexit_handler+0x41/0xc0
(XEN)
(XEN) *** Dumping CPU4 guest state (d1v1): ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    4
(XEN) RIP:    0010:[<fffff8000298520c>]
(XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
(XEN) rax: 0000000000000002   rbx: fffff88002e40180   rcx: fffff88002e401c0
(XEN) rdx: 0000000000000400   rsi: 0000000000000000   rdi: fffff88002e4aeb0
(XEN) rbp: 000000000000000f   rsp: fffff88002e4ac20   r8: fffff88002e402a0
(XEN) r9:  000032242fda2dba   r10: fffff88002e402a0   r11: fffff88002e4ad70
(XEN) r12: fffff88002e4ad70   r13: 000036d08fbab094   r14: 000000000000000f
(XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
(XEN) cr3: 0000000000187000   cr2: fffff8a00033e03e
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU5 host state: ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    5
(XEN) RIP:    e008:[<ffff82d08012a9a1>] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff8300a942c000   rcx: 0000000000000005
(XEN) rdx: ffff830c3dc80000   rsi: 0000000000000005   rdi: ffff830c3dc8e088
(XEN) rbp: ffff830c3dc87ec8   rsp: ffff830c3dc87e40   r8: ffff830c3dc8e0a0
(XEN) r9:  0000000000000000   r10: fffff88002f2c2a0   r11: fffff88002f36d70
(XEN) r12: 0000151cfdff3536   r13: ffff8300a942c000   r14: ffff830c3dc8e088
(XEN) r15: 0000000001c9c380   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000c19cb4000   cr2: 0000000001550320
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff830c3dc87e40:
(XEN)    ffff82d080126ec5 ffff82d080321280 ffff830c3dc8e0a0 0000000500c87e78
(XEN)    ffff830c3dc8e080 ffff82d0801b5277 ffff8300a942c000 fffff88002f36d70
(XEN)    ffff8300a942c000 0000000001c9c380 ffff82d0801e0f00 ffff830c3dc87f08
(XEN)    ffff82d0802f8280 ffff82d0802f8000 ffffffffffffffff ffff830c3dc80000
(XEN)    0000000000000001 ffff830c3dc87ef8 ffff82d08012a1b3 ffff8300a942c000
(XEN)    fffff88002f36d70 000036d08fbb1ba4 000000000000000f ffff830c3dc87f08
(XEN)    ffff82d08012a20b 000000000000000f ffff82d0801e3d2a 0000000000000001
(XEN)    000000000000000f 000036d08fbb1ba4 fffff88002f36d70 000000000000000f
(XEN)    fffff88002f2c180 fffff88002f36d70 fffff88002f2c2a0 000035b9461016e5
(XEN)    fffff88002f2c2a0 0000000000000002 fffff88002f2c1c0 0000000000000400
(XEN)    0000000000000000 fffff88002f36eb0 0000beef0000beef fffff8000298520c
(XEN)    000000bf0000beef 0000000000000046 fffff88002f36c20 000000000000beef
(XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN)    0000000000000005 ffff8300a942c000 0000003bbd96ce00 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08012a9a1>] _spin_unlock_irq+0x30/0x31
(XEN)    [<ffff82d08012a1b3>] __do_softirq+0x81/0x8c
(XEN)    [<ffff82d08012a20b>] do_softirq+0x13/0x15
(XEN)    [<ffff82d0801e3d2a>] vmx_asm_do_vmentry+0x2a/0x45
(XEN)
(XEN) *** Dumping CPU5 guest state (d1v3): ***
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    5
(XEN) RIP:    0010:[<fffff8000298520c>]
(XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
(XEN) rax: 0000000000000002   rbx: fffff88002f2c180   rcx: fffff88002f2c1c0
(XEN) rdx: 0000000000000400   rsi: 0000000000000000   rdi: fffff88002f36eb0
(XEN) rbp: 000000000000000f   rsp: fffff88002f36c20   r8: fffff88002f2c2a0
(XEN) r9:  000035b9461016e5   r10: fffff88002f2c2a0   r11: fffff88002f36d70
(XEN) r12: fffff88002f36d70   r13: 000036d08fbb1ba4   r14: 000000000000000f
(XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
(XEN) cr3: 0000000000187000   cr2: 0000000001550320
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010

Thanks!

Steve

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.