[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] XEN crash and double fault when doing cpu online/offline




On 1/8/20 3:50 PM, Jürgen Groß wrote:
On 08.01.20 06:50, Tao Xu wrote:
Hi,

When I use xen-hptool cpu-offline/cpu-online to let CPU in a socket online/offline using the script as follows:

for((j=48;j<=95;j++));
do
   xen-hptool cpu-offline $j
done

for((j=48;j<=95;j++));
do
   xen-hptool cpu-online $j
done

Xen crash when cpu re-online. I use the upstream XEN(0dd92688) and try many days, it still crash. But if I only do cpu online/offline for CPU 48~59, Xen will not crash. The bug can be reproduced when we do cpu online/offline for most CPU in a socket. And interesting thing is when we use the script as follow:

for((j=48;j<=95;j++));
do
   xen-hptool cpu-offline $j
   xen-hptool cpu-online $j
done

Xen will not crash too. Is there a bug in sched_credit2?

The crash message as follows:

(XEN) Adding cpu 77 to runqueue 1
(XEN) Adding cpu 78 to runqueue 1
(XEN) Adding cpu 79 to runqueue 1
(XEN) Adding cpu 80 to runqueue 1
(X(ENXE) N) *** DOUBLE FAULT ***
(XEN) Assertion 'debug->cpu == smp_processor_id()' failed at spinlock.c:88
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) Debugging connection not set up.
(XEN) CPU:    48
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080240bfc>] _spin_unlock+0x40/0x42

So the original problem causes a double fault, but spinlock debugging
causes a subsequent panic.

Can you please retry the tests with the attached patch? It should
result in diagnostic data related to the real problem.


Juergen

Hi Juergen,

After apply your patch, spin_lock still assert. And the address ffff82d0bffce880 is not in the xen-syms.

(XEN) Adding cpu 78 to runqueue 1
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    49
(XEN) RIP:    e008:[<ffff82d0bffce880>] ffff82d0bffce880
(XEN) RFLAGS: 0000000000010012   CONTEXT: hypervisor
(XEN) rax: 0000000000000018   rbx: 00000adda6074720   rcx: ffffffff8100130a
(XEN) rdx: ffffc90041114e40   rsi: 000000000000003b   rdi: 0000000000000008
(XEN) rbp: 000000000000003b   rsp: ffffc90041114e28   r8:  00000adda5f86678
(XEN) r9:  00000040bb3e6121   r10: 00000040bb2f1ee1   r11: 0000000000000212
(XEN) r12: ffff88fcdbcd7140   r13: ffff88fcdbcde438   r14: ffff88fcdbcde478
(XEN) r15: ffff88fcdbcde4b8   cr0: 0000000080050033   cr4: 00000000003426e0
(XEN) cr3: 0000002391e02000   cr2: ffffc90041114e18
(XEN) fsb: 0000000000000000   gsb: ffff88fcdbcc0000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0bffce880> (ffff82d0bffce880):
(XEN) 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (XEN) Current stack base ffffc90041110000 differs from expected ffff837e77190000 (XEN) Valid stack range: ffffc90041116000-ffffc90041118000, sp=ffffc90041114e28, tss.rsp0=ffff837e77197fa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 49:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Debugging connection not set up.
(XEN)( XEN) *** DOUBLE FAULT ***
(XEN) Assertion 'atomic_read(&spin_debug) > 0 || debug->cpu == smp_processor_id()' failed at spinlock.c:88
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) Debugging connection not set up.
(XEN) CPU:    52
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    50
(XEN) RIP:    e008:[<ffff82d080240c06>] _spin_unlock+0x4a/0x4c
(XEN) RFLAGS: 0000000000050002   CONTEXT: hypervisor (d0v1)
(XEN) rax: ffff837e77017fff   rbx: 0000000000040046   rcx: 0000000000000000
(XEN) rdx: 0000000000000034   rsi: 0000000000040046   rdi: ffff82d080819860
(XEN) rbp: ffff837e77010d38   rsp: ffff837e77010d38   r8:  0000000000000000
(XEN) r9:  0000000000000004   r10: 0000000000000001   r11: 0000000000000000
(XEN) r12: ffff82d08044d284   r13: 0000000000000010   r14: ffff82d08044d284
(XEN) r15: ffff82d0808197e0   cr0: 0000000080050033   cr4: 00000000003426e0
(XEN) cr3: 000000200e60a000   cr2: ffffc9004007cbb8
(XEN) fsb: 0000000000000000   gsb: ffff88fcdae40000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d080240c06> (_spin_unlock+0x4a/0x4c):
(XEN) 7f 00 00 3b 50 c1 74 dc <0f> 0b 55 48 89 e5 e8 ab ff ff ff fb 5d c3 55 48
(XEN) Xen stack trace from rsp=ffff837e77010d38:
(XEN)    ffff837e77010d50 ffff82d080240c21 0000000000000020 ffff837e77010da8
(XEN)    ffff82d080252eb8 0000000d8f512778 0000000000040046 ffff82d080819860
(XEN)    0000001000000000 0000000000000006 ffff82d08044d27e ffff82d08093e700
(XEN)    0000000000040086 ffff837e77010e58 ffff837e77010db8 ffff82d08024fe4b
(XEN)    ffff837e77010dd8 ffff82d08024fe87 0000000000000000 ffff83201081d3a0
(XEN)    ffff837e77010e40 ffff82d08024feec 44e4a2a937cfbed7 a22ad7391a19609e
(XEN)    d4f7a456dec5cb24 ffff837e77010e20 ffff82d080240b77 ffff82d080819718
(XEN)    ffff82d0804564bf ffff83201081d3a0 ffff837e77010e98 0000000000040086
(XEN)    ffff82d08093e714 ffff837e77010e88 ffff82d0802503f9 ffff82d08044d27e
(XEN)    ffff82d08093e700 ffff837e77010f58 0000000000000032 0000000000000000
(XEN)    ffff837e77017fff 0000000000000000 ffff837e77010ee0 ffff82d080250511
(XEN)    ffff837e00000008 ffff837e77010ef0 ffff837e77010eb0 ffff837e77017fff
(XEN)    0000000000040046 0000000000000032 ffff82d080819701 0000000000000000
(XEN)    0000000000000000 ffff837e77010f48 ffff82d080382f2a ffff82d080389c66
(XEN)    ffff82d080389c72 ffff82d080389c66 ffff82d080389c72 ffff82d080389c66
(XEN)    ffff82d080389c72 ffff82d080389c66 ffff82d080389c72 0000000000000000
(XEN)    0000000000000000 0000000000000000 00007c8188fef087 ffff82d080389cc7
(XEN)    ffff88fcc9036f00 ffffc9004007cde8 000000000002ad80 ffff88fcc97d5d00
(XEN)    0000000000000002 ffffc9004007cc50 0000000000000286 0000000000000014
(XEN)    0000000000000400 0000000000000014 0000000000000017 ffffffff810012eb
(XEN) Xen call trace:
(XEN)    [<ffff82d080240c06>] R _spin_unlock+0x4a/0x4c
(XEN)    [<ffff82d080240c21>] F _spin_unlock_irqrestore+0xd/0x24
(XEN)    [<ffff82d080252eb8>] F serial_puts+0x131/0x141
(XEN)    [<ffff82d08024fe4b>] F console_serial_puts+0x28/0x2a
(XEN)    [<ffff82d08024fe87>] F drivers/char/console.c#__putstr+0x3a/0x8b
(XEN) [<ffff82d08024feec>] F drivers/char/console.c#printk_start_of_line+0x14/0x17b (XEN) [<ffff82d0802503f9>] F drivers/char/console.c#vprintk_common+0x8d/0x158
(XEN)    [<ffff82d080250511>] F printk+0x4d/0x4f
(XEN)    [<ffff82d080382f2a>] F do_double_fault+0x2b/0x82
(XEN)    [<ffff82d080389cc7>] F double_fault+0x107/0x110
(XEN)
(XEN) RIP:    e008:[<ffff82d0bffcba00>](XEN)
(XEN) ****************************************
 ffff82d0bffcba00(XEN) Panic on CPU 50:

(XEN) RFLAGS: 0000000000010006 (XEN) Assertion 'atomic_read(&spin_debug) > 0 || debug->cpu == smp_processor_id()' failed at spinlock.c:88
CONTEXT: hypervisor(XEN) ****************************************
(XEN)

(XEN) rax: 0000000000000020   rbx: ffff88fcdb52ad80   rcx: ffffffff8100140a
(XEN) Reboot in five seconds...
(XEN) rdx: ffff88fcc98e6628   rsi: ffffc900408f8d24   rdi: 0000000000000004
(XEN) Debugging connection not set up.
(XEN) rbp: ffff88fcbe39cb80   rsp: ffffc900408f8d08   r8:  ffff88fcca4068f0
(XEN) r9:  ffff88fcca4069a0   r10: 0000000000000000   r11: 0000000000000206
(XEN) r12: 0000000000000004   r13: ffffc900408f8d80   r14: ffff88fcbe39d2fc
(XEN) r15: ffff88fcdb52ad80   cr0: 0000000080050033   cr4: 00000000003426e0
(XEN) cr3: 000000200e60a000   cr2: ffffc900408f8cf8
(XEN) fsb: 0000000000000000   gsb: ffff88fcdb100000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0bffcba00> (ffff82d0bffcba00):
(XEN) 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (XEN) Current stack base ffffc900408f8000 differs from expected ffff837e77038000 (XEN) Valid stack range: ffffc900408fe000-ffffc90040900000, sp=ffffc900408f8d08, tss.rsp0=ffff837e7703ffa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 52:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Debugging connection not set up.
(XEN) Debugging connection not set up.
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<0000000067b4cb2d>] 0000000067b4cb2d
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830059027a60   rcx: 0000000067c50000
(XEN) rdx: 0000000000000000   rsi: 00000000003526e0   rdi: ffff830059027a40
(XEN) rbp: ffff830059027b68   rsp: ffff8300590279a0   r8:  ffff830059027a60
(XEN) r9:  ffff830059027a40   r10: 0000000067b4e1b8   r11: 0101010101010101
(XEN) r12: 00000000fffffffe   r13: 0000000000000000   r14: 0000000000000065
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 000000203fe4e000   cr2: 0000000067c50010
(XEN) fsb: 0000000000000000   gsb: ffff88fcdb280000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <0000000067b4cb2d> (0000000067b4cb2d):
(XEN) 6b c0 10 48 8b 4c 24 20 <48> 8b 44 01 10 48 89 44 24 28 48 8b 44 24 28 48
(XEN) Xen stack trace from rsp=ffff8300590279a0:
(XEN)    ffff8300590279b0 ffff8300590279c8 ffff82d080240cd5 ffff82d0802510eb
(XEN)    0000000067c50000 ffff830059027a00 0000000000000206 0000000067b4bf3c
(XEN)    ffff830059027a60 ffff82d0808197e0 ffff830059027aa8 0000000000000000
(XEN)    000000203fe4e000 0000000067b4b590 ffff830059027ae0 00000000000000f1
(XEN)    ffff830059027a30 ffff82d080240ba5 ffff830059027a98 0000000067aeb54b
(XEN)    ffff82d080389845 ffff832010000424 ffff830059027c68 0000000400000000
(XEN)    00000000000fa000 67c5000000000200 0000000000000000 0000000067aeb8d7
(XEN)    0000000000000000 ffff830059027fff 0000000000000000 00007cffa6fd8537
(XEN)    0000000000000000 0000000067aeb6ae 00000000000000fb ffff82d080808aa0
(XEN)    00000000003526e0 ffff830059027b20 0000000000000000 0000000067aeb476
(XEN)    ffff830000000000 ffff830059027b40 0000000059014000 0000000000000000
(XEN)    ffff830059027b30 ffff82d0803867c4 0000000000000000 ffff82d080386ac8
(XEN)    0000000000000000 00000000fffffffe ffff830059027b68 ffff82d080386a99
(XEN)    0000000059014000 000000000000e010 0000000000000000 00000000000000fb
(XEN)    ffffffffffffffff ffff830059027bb8 ffff82d0802a4964 0000138880389851
(XEN)    000082d080389845 0000000000000000 0000000000000000 00000000000000fb
(XEN)    ffff830059027c68 00000000000000fb 0000000000000000 ffff830059027bc8
(XEN)    ffff82d0802a4a91 ffff830059027be0 ffff82d080240a08 0000000000000000
(XEN)    ffff830059027bf0 ffff82d0802a5136 ffff830059027c58 ffff82d0802858bf
(XEN)    ffff82d080389845 ffff82d080389851 0000000000000000 8000000080389851
(XEN) Xen call trace:
(XEN)    [<0000000067b4cb2d>] R 0000000067b4cb2d
(XEN)    [<ffff8300590279b0>] S ffff8300590279b0
(XEN)    [<ffff82d0802a4964>] F machine_restart+0x168/0x28a
(XEN)    [<ffff82d0802a4a91>] F send_IPI_mask+0/0xc
(XEN)    [<ffff82d080240a08>] F smp_call_function_interrupt+0xa8/0xac
(XEN)    [<ffff82d0802a5136>] F call_function_interrupt+0x20/0x34
(XEN)    [<ffff82d0802858bf>] F do_IRQ+0x148/0x6d4
(XEN)    [<ffff82d0803898ba>] F common_interrupt+0x10a/0x120
(XEN)    [<ffff82d080253645>] F cpufreq_add_cpu+0xbc/0x5cf
(XEN) [<ffff82d080253da9>] F drivers/cpufreq/cpufreq.c#cpu_callback+0x27/0x32
(XEN)    [<ffff82d0802242c0>] F notifier_call_chain+0x6b/0x96
(XEN) [<ffff82d080200f95>] F common/cpu.c#cpu_notifier_call_chain+0x1b/0x33
(XEN)    [<ffff82d080201215>] F cpu_up+0xa8/0xe5
(XEN)    [<ffff82d0802a8185>] F cpu_up_helper+0xf/0xa5
(XEN) [<ffff82d080205d5d>] F common/domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
(XEN)    [<ffff82d080242de5>] F common/tasklet.c#do_tasklet_work+0x76/0xa9
(XEN)    [<ffff82d0802430c6>] F do_tasklet+0x58/0x8a
(XEN)    [<ffff82d080275545>] F arch/x86/domain.c#idle_loop+0x40/0x9b
(XEN)
(XEN) Pagetable walk from 0000000067c50010:
(XEN)  L4[0x000] = 000000203fe4d063 ffffffffffffffff
(XEN)  L3[0x001] = 000000005900d063 ffffffffffffffff
(XEN)  L2[0x13e] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000067c50010
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Debugging connection not set up.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.