[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split



George Dunlap wrote:
Andre,

Can you try again with the attached patch?
Sure. Unfortunately (or is this a good sign?) the "Migration failed" message didn't trigger, I only saw various instances of the other printk, see the attached log file. Migration is happening quite often, because Dom0 has 48 vCPUs and in the end they are squashed into less and less pCPUs. I guess that is the reason my I see it on my machine.

Regards,
Andre.


Thanks,
 -George

On Tue, Feb 8, 2011 at 12:08 PM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx> wrote:
On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
On 02/07/11 16:55, George Dunlap wrote:
Juergen,

What is supposed to happen if a domain is in cpupool0, and then all of
the cpus are taken out of cpupool0?  Is that possible?
No. Cpupool0 can't be without any cpu, as Dom0 is always member of cpupool0.
If that's the case, then since Andre is running this immediately after
boot, he shouldn't be seeing any vcpus in the new pools; and all of
the dom0 vcpus should be migrated to cpupool0, right?  Is it possible
that migration process isn't happening properly?

It looks like schedule.c:cpu_disable_scheduler() will try to migrate
all vcpus, and if it fails to migrate, it returns -EAGAIN so that the
tools will try again.  It's probably worth instrumenting that whole
code-path to make sure it actually happens as we expect.  Are we
certain, for example, that if a hypercall continued on another cpu
will actually return the new error value properly?

Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why
is the cpu's bit set in cpupool_free_cpus without checking to see if
the cpu_disable_scheduler() call actually worked?  Shouldn't that also
be inside the if() statement?

 -George



--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
root@dosorca:/data/images# sh numasplit.sh
Removing CPUs from Pool 0
(XEN) cpu_disable_scheduler: Migrating d0v14 from cpu 6
(XEN) cpu_disable_scheduler: Migrating d0v26 from cpu 6
(XEN) cpu_disable_scheduler: Migrating d0v9 from cpu 7
(XEN) cpu_disable_scheduler: Migrating d0v23 from cpu 7
(XEN) cpu_disable_scheduler: Migrating d0v9 from cpu 8
(XEN) cpu_disable_scheduler: Migrating d0v19 from cpu 8
(XEN) cpu_disable_scheduler: Migrating d0v0 from cpu 9
(XEN) cpu_disable_scheduler: Migrating d0v9 from cpu 9
(XEN) cpu_disable_scheduler: Migrating d0v19 from cpu 9
(XEN) cpu_disable_scheduler: Migrating d0v0 from cpu 10
(XEN) cpu_disable_scheduler: Migrating d0v9 from cpu 10
(XEN) cpu_disable_scheduler: Migrating d0v19 from cpu 10
(XEN) cpu_disable_scheduler: Migrating d0v0 from cpu 11
(XEN) cpu_disable_scheduler: Migrating d0v9 from cpu 11
(XEN) cpu_disable_scheduler: Migrating d0v19 from cpu 11
(XEN) cpu_disable_scheduler: Migrating d0v31 from cpu 11
Rewriting config file
Creating new pool
Using config file "cpupool.test"
cpupool name:   Pool-node1
scheduler:      credit
number of cpus: 1
Populating new pool
Removing CPUs from Pool 0
(XEN) cpu_disable_scheduler: Migrating d0v44 from cpu 12
(XEN) cpu_disable_scheduler: Migrating d0v14 from cpu 13
(XEN) cpu_disable_scheduler: Migrating d0v33 from cpu 13
(XEN) cpu_disable_scheduler: Migrating d0v44 from cpu 13
(XEN) cpu_disable_scheduler: Migrating d0v10 from cpu 14
(XEN) cpu_disable_scheduler: Migrating d0v33 from cpu 14
(XEN) cpu_disable_scheduler: Migrating d0v44 from cpu 14
(XEN) cpu_disable_scheduler: Migrating d0v10 from cpu 15
(XEN) cpu_disable_scheduler: Migrating d0v33 from cpu 15
(XEN) cpu_disable_scheduler: Migrating d0v44 from cpu 15
(XEN) cpu_disable_scheduler: Migrating d0v10 from cpu 16
(XEN) cpu_disable_scheduler: Migrating d0v33 from cpu 16
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 16
(XEN) cpu_disable_scheduler: Migrating d0v10 from cpu 17
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 17
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 17
Rewriting config file
Creating new pool
Using config file "cpupool.test"
cpupool name:   Pool-node2
scheduler:      credit
number of cpus: 1
Populating new pool
Removing CPUs from Pool 0
(XEN) cpu_disable_scheduler: Migrating d0v10 from cpu 18
(XEN) cpu_disable_scheduler: Migrating d0v29 from cpu 18
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 18
(XEN) cpu_disable_scheduler: Migrating d0v29 from cpu 19
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 19
(XEN) cpu_disable_scheduler: Migrating d0v6 from cpu 20
(XEN) cpu_disable_scheduler: Migrating d0v29 from cpu 20
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 20
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 21
(XEN) cpu_disable_scheduler: Migrating d0v14 from cpu 21
(XEN) cpu_disable_scheduler: Migrating d0v29 from cpu 21
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 21
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 22
(XEN) cpu_disable_scheduler: Migrating d0v14 from cpu 22
(XEN) cpu_disable_scheduler: Migrating d0v23 from cpu 22
(XEN) cpu_disable_scheduler: Migrating d0v29 from cpu 22
(XEN) cpu_disable_scheduler: Migrating d0v41 from cpu 22
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 23
(XEN) cpu_disable_scheduler: Migrating d0v14 from cpu 23
(XEN) cpu_disable_scheduler: Migrating d0v23 from cpu 23
(XEN) cpu_disable_scheduler: Migrating d0v29 from cpu 23
Rewriting config file
Creating new pool
Using config file "cpupool.test"
cpupool name:   Pool-node3
scheduler:      credit
number of cpus: 1
Populating new pool
Removing CPUs from Pool 0
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 29
(XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 29
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 29
(XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 29
Rewriting config file
Creating new pool
Using config file "cpupool.test"
cpupool name:   Pool-node4
scheduler:      credit
number of cpus: 1
(XEN) Xen BUG at sched_credit.c:384
(XEN) ----[ Xen-4.1.0-rc3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    32
(XEN) RIP:    e008:[<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
(XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
(XEN) rax: ffff830434322000   rbx: ffff830a3800f1e8   rcx: 0000000000000018
(XEN) rdx: ffff82c4802d3ec0   rsi: 0000000000000002   rdi: ffff83043445e100
(XEN) rbp: ffff8304343efce8   rsp: ffff8304343efca8   r8:  0000000000000001
(XEN) r9:  ffff830a3800f1e8   r10: ffff82c480219dc0   r11: 0000000000000286
(XEN) r12: 0000000000000018   r13: ffff8310341a7d50   r14: ffff830a3800f1d0
(XEN) r15: 0000000000000018   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000806aed000   cr2: 00007f50c671def5
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8304343efca8:
(XEN)    ffff8304343efcb8 ffff8310341a7d50 0000000000000282 0000000000000018
(XEN)    ffff830a3800f460 ffff8310341a7c60 0000000000000018 ffff82c4802b0880
(XEN)    ffff8304343efd58 ffff82c48011fa63 ffff82f601024d80 000000000008126c
(XEN)    ffff8300c7e42000 0000000000000000 0000080000000000 ffff82c480248b80
(XEN)    0000000000000002 0000000000000018 ffff830a3800f460 0000000000305000
(XEN)    ffff82c4802550e4 ffff82c4802b0880 ffff8304343efd78 ffff82c48010188c
(XEN)    ffff8304343efe40 0000000000000018 ffff8304343efdb8 ffff82c480101b94
(XEN)    ffff8304343efdb8 ffff82c480183562 fffffffe00000286 ffff8304343eff18
(XEN)    000000000066e004 0000000000305000 ffff8304343efef8 ffff82c4801252a1
(XEN)    ffff8304343efdd8 0000000180153c8d 0000000000000000 ffff82c4801068f8
(XEN)    0000000000000296 ffff8300c7e1e1c8 aaaaaaaaaaaaaaaa 0000000000000000
(XEN)    ffff88007d094170 ffff88007d094170 ffff8304343efef8 ffff82c480113d8a
(XEN)    ffff8304343efe78 ffff8304343efe88 0000000800000012 0000000400000004
(XEN)    00007fff00000001 0000000000000018 00000000000000b3 0000000000000072
(XEN)    00007f50c64e5960 0000000000000018 00007fff85f117c0 00007f50c6b48342
(XEN)    0000000000000001 0000000000000000 0000000000000018 0000000000000004
(XEN)    000000000066d050 000000000066e000 85f1189c00000000 0000000000000033
(XEN)    ffff8304343efed8 ffff8300c7e1e000 00007fff85f11600 0000000000305000
(XEN)    0000000000000003 0000000000000003 00007cfbcbc100c7 ffff82c480207be8
(XEN)    ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003
(XEN) Xen call trace:
(XEN)    [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
(XEN)    [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1cd
(XEN)    [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b
(XEN)    [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461
(XEN)    [<ffff82c4801252a1>] do_sysctl+0x921/0xa30
(XEN)    [<ffff82c480207be8>] syscall_enter+0xc8/0x122
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 32:
(XEN) Xen BUG at sched_credit.c:384
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.