[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support



On 16.07.19 17:45, Sergey Dyasli wrote:
On 05/07/2019 14:17, Sergey Dyasli wrote:
[2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that 
CPU30 is stuck!
[2019-07-05 00:37:16 UTC] (XEN) [24907.514180] ----[ Xen-4.13.0-8.0.6-d  x86_64 
 debug=y   Not tainted ]----
[2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU:    30
[2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP:    
e008:[<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
[2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0000000000000202   
CONTEXT: hypervisor
[2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0000000000000002   rbx: 
ffff83202782e880   rcx: 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: ffff83202782e904   rsi: 
ffff832027823000   rdi: ffff832027823000
[2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: ffff83403cab7d20   rsp: 
ffff83403cab7d00   r8:  0000000000000000
[2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9:  0000000000000000   r10: 
0200200200200200   r11: 0100100100100100
[2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: ffff832027823000   r13: 
ffff832027823000   r14: ffff83202782e7b0
[2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: ffff83202782e880   cr0: 
000000008005003b   cr4: 00000000000426e0
[2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: 00000000bd8a1000   cr2: 
000000001851b798
[2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: 0000000000000000   gsb: 
0000000000000000   gss: 0000000000000000
[2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: 0000   es: 0000   fs: 0000   
gs: 0000   ss: 0000   cs: e008
[2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around 
<ffff82d0802406fc> (sched_context_switched+0xaf/0x101):
[2019-07-05 00:37:16 UTC] (XEN) [24907.990277]  00 00 eb 18 f3 90 8b 02 <85> c0 
75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8
[2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from 
rsp=ffff83403cab7d00:
[2019-07-05 00:37:16 UTC] (XEN) [24908.061298]    ffff832027823000 
ffff832027823000 0000000000000000 ffff83202782e880
[2019-07-05 00:37:16 UTC] (XEN) [24908.098529]    ffff83403cab7d60 
ffff82d0802407c0 0000000000000082 ffff83202782e7c8
[2019-07-05 00:37:16 UTC] (XEN) [24908.135622]    000000000000001e 
ffff83202782e7c8 000000000000001e ffff82d080602628
[2019-07-05 00:37:16 UTC] (XEN) [24908.172671]    ffff83403cab7dc0 
ffff82d080240d83 000000000000df99 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24908.210212]    ffff832027823000 
000016a62dc8c6bc 000000fc00000000 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24908.247181]    ffff83202782e7c8 
ffff82d080602628 ffff82d0805da460 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24908.284279]    ffff83403cab7e60 
ffff82d080240ea4 00000002802aecc5 ffff832027823000
[2019-07-05 00:37:16 UTC] (XEN) [24908.321128]    ffff83202782e7b0 
ffff83202782e880 ffff83403cab7e10 ffff82d080273b4e
[2019-07-05 00:37:16 UTC] (XEN) [24908.358308]    ffff83403cab7e10 
ffff82d080242f7f ffff83403cab7e60 ffff82d08024663a
[2019-07-05 00:37:17 UTC] (XEN) [24908.395662]    ffff83403cab7ea0 
ffff82d0802ec32a ffff8340000000ff ffff82d0805bc880
[2019-07-05 00:37:17 UTC] (XEN) [24908.432376]    ffff82d0805bb980 
ffffffffffffffff ffff83403cab7fff 000000000000001e
[2019-07-05 00:37:17 UTC] (XEN) [24908.469812]    ffff83403cab7e90 
ffff82d080242575 0000000000000f00 ffff82d0805bb980
[2019-07-05 00:37:17 UTC] (XEN) [24908.508373]    000000000000001e 
ffff82d0806026f0 ffff83403cab7ea0 ffff82d0802425ca
[2019-07-05 00:37:17 UTC] (XEN) [24908.549856]    ffff83403cab7ef0 
ffff82d08027a601 ffff82d080242575 0000001e7ffde000
[2019-07-05 00:37:17 UTC] (XEN) [24908.588022]    ffff832027823000 
ffff832027823000 ffff83127ffde000 ffff83203ffe5000
[2019-07-05 00:37:17 UTC] (XEN) [24908.625217]    000000000000001e 
ffff831204092000 ffff83403cab7d78 00000000ffffffed
[2019-07-05 00:37:17 UTC] (XEN) [24908.662932]    ffffffff81800000 
0000000000000000 ffffffff81800000 0000000000000000
[2019-07-05 00:37:17 UTC] (XEN) [24908.703246]    ffffffff818f4580 
ffff880039118848 00000e6a3c4b2698 00000000148900db
[2019-07-05 00:37:17 UTC] (XEN) [24908.743671]    0000000000000000 
ffffffff8101e650 ffffffff8185c3e0 0000000000000000
[2019-07-05 00:37:17 UTC] (XEN) [24908.781927]    0000000000000000 
0000000000000000 0000beef0000beef ffffffff81054eb2
[2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace:
[2019-07-05 00:37:17 UTC] (XEN) [24908.836789]    [<ffff82d0802406fc>] 
sched_context_switched+0xaf/0x101
[2019-07-05 00:37:17 UTC] (XEN) [24908.869916]    [<ffff82d0802407c0>] 
schedule.c#sched_context_switch+0x72/0x151
[2019-07-05 00:37:17 UTC] (XEN) [24908.907384]    [<ffff82d080240d83>] 
schedule.c#sched_slave+0x2a3/0x2b2
[2019-07-05 00:37:17 UTC] (XEN) [24908.941241]    [<ffff82d080240ea4>] 
schedule.c#schedule+0x112/0x2a1
[2019-07-05 00:37:17 UTC] (XEN) [24908.973939]    [<ffff82d080242575>] 
softirq.c#__do_softirq+0x85/0x90
[2019-07-05 00:37:17 UTC] (XEN) [24909.007101]    [<ffff82d0802425ca>] 
do_softirq+0x13/0x15
[2019-07-05 00:37:17 UTC] (XEN) [24909.035971]    [<ffff82d08027a601>] 
domain.c#idle_loop+0xad/0xc0
[2019-07-05 00:37:17 UTC] (XEN) [24909.070546]
[2019-07-05 00:37:17 UTC] (XEN) [24909.080286] CPU0 @ e008:ffff82d0802431ba 
(stop_machine.c#stopmachine_wait_state+0x1a/0x24)
[2019-07-05 00:37:17 UTC] (XEN) [24909.122896] CPU1 @ e008:ffff82d0802406f8 
(sched_context_switched+0xab/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.159518] CPU3 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.199607] CPU2 @ e008:ffff82d0802406fc 
(sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.235773] CPU5 @ e008:ffff82d0802431f4 
(stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.276039] CPU4 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.312371] CPU7 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.352930] CPU6 @ e008:ffff82d0802406fc 
(sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.388928] CPU8 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.424664] CPU9 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.465376] CPU10 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.507449] CPU11 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.544703] CPU13 @ e008:ffff82d0802431f2 
(stop_machine.c#stopmachine_action+0x2e/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.588884] CPU12 @ e008:ffff82d0802406fc 
(sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.625781] CPU15 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.666649] CPU14 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.703396] CPU17 @ e008:ffff82d0802431f4 
(stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.744089] CPU16 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.781117] CPU23 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.821692] CPU22 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.858139] CPU27 @ e008:ffff82d0802431f4 
(stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.898704] CPU26 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24909.936069] CPU19 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24909.977291] CPU18 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.014078] CPU31 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.055692] CPU21 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.100486] CPU24 @ e008:ffff82d0802406fa 
(sched_context_switched+0xad/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.136824] CPU25 @ e008:ffff82d0802431fa 
(stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.177529] CPU29 @ e008:ffff82d0802431f4 
(stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.218420] CPU28 @ e008:ffff82d0802406fc 
(sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.255219] CPU20 @ e008:ffff82d0802406fc 
(sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.292152]
[2019-07-05 00:37:19 UTC] (XEN) [24910.301667] 
****************************************
[2019-07-05 00:37:19 UTC] (XEN) [24910.327892] Panic on CPU 30:
[2019-07-05 00:37:19 UTC] (XEN) [24910.344165] FATAL TRAP: vector = 2 (nmi)
[2019-07-05 00:37:19 UTC] (XEN) [24910.365476] [error_code=0000]
[2019-07-05 00:37:19 UTC] (XEN) [24910.382509] 
****************************************
[2019-07-05 00:37:19 UTC] (XEN) [24910.408547]
[2019-07-05 00:37:19 UTC] (XEN) [24910.418129] Reboot in five seconds...

On a closer look, the second crash happens when you try to shutdown
the host ("poweroff" in my case).

And that was just another bug: the scheduler is still active when
trying to enter ACPI deep sleep states. As non-boot cpus are being
taken down via tasklets this will result in syncing problems when
one cpu of a sched_resource is down already and the other is waiting
for it to finish scheduling...

Replacing the common scheuling softirq handler with one doing only
tasklet scheduling in that case makes it work again.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.