[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit()) (fwd)



>>> On 09.01.18 at 23:51, <martin@xxxxxxxxx> wrote:
> Hello.

I'm sorry for taking so long to get back to this.

> On Tue, 9 Jan 2018, Jan Beulich wrote:
>>>>> On 08.01.18 at 17:07, <martin@xxxxxxxxx> wrote:
>>> On Mon, 8 Jan 2018, Jan Beulich wrote:
>>>>>>> On 07.01.18 at 13:34, <martin@xxxxxxxxx> wrote:
>>>>> (XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----
>>>>
>>>> The -vgpu tag makes me wonder whether you have any patches in
>>>> your tree on top of plain 4.10.0 (or 4.10-staging). Also the debug=n
>>>> above ...
>>>
>>> 4.10.0 + 11 patches to make nvidia/vgpu work
>>> (https://github.com/xenserver/xen-4.7.pg).
>>> debug=n because xen's modified debug build process.
>>>
>>>>> (XEN)    [<ffff82d08026ae60>] __find_next_bit+0x10/0x80
>>>>> (XEN)    [<ffff82d080253180>] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
>>>>> (XEN)    [<ffff82d0802c7c0e>] mwait-idle.c#mwait_idle+0x23e/0x340
>>>>> (XEN)    [<ffff82d08026fa56>] domain.c#idle_loop+0x86/0xc0
>>>>
>>>> ... makes this call trace unreliable. But even with a reliable call
>>>> trace, analysis of the crash would be helped if you made
>>>> available the xen-syms (or xen.efi, depending on how you boot)
>>>> somewhere.
>>>
>>> xen-syms - http://www.uschovna.cz/en/zasilka/UDP5LVE2679CGBIS-4YV/ 
>>
>> Thanks. Looks to be a race between a timer in the governor and
>> the CPUs being brought down. In general the governor is supposed
>> to be disabled in the course of CPUs being brought down, so first
>> of all I wonder whether you're having some daemon in use which
>> sends management requests to the CPUfreq driver in Xen. Such a
>> daemon should of course be disabled by the system shutdown
>> scripts. Otherwise please try the attached debugging patch -
>> maybe we can see something from its output.
> 
> I suppose there should no be running anything because Dom0 kernel already 
> ended (see last two messages from dom0 kernel). Or how to check it ?
> 
> Patch added.
> - no "dbs:" in output (grep "dbs:" ...)

Okay, this sufficiently proves that nothing tries to re-enable a
governor after they were brought down for the CPUs.

> - exaples of shutdown output (1* OK + 2* fail):
> 
> -----------------------------------------------------
> 
> [  632.439402] ACPI: Preparing to enter system sleep state S5
> [  632.486728] reboot: Power down
> (XEN) Preparing system for ACPI S5 state.
> (XEN) Disabling non-boot CPUs ...
> (XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
> (XEN) Broke affinity for irq 140
> (XEN) cpufreq: del CPU2 (1,4,1,4)
> (XEN) Broke affinity for irq 139
> (XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
> (XEN) Broke affinity for irq 83
> (XEN) cpufreq: del CPU4 (1,10,1,10)
> (XEN) Broke affinity for irq 137
> (XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
> (XEN) cpufreq: del CPU6 (1,40,1,40)
> (XEN) Broke affinity for irq 141
> (XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
> (XEN) cpufreq: del CPU8 (1,100,1,100)
> (XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
> (XEN) cpufreq: del CPU10 (1,400,1,400)
> (XEN) cpufreq: del CPU11 (1,ffa801,1,800)
> (XEN) cpufreq: del CPU12 (1,1000,1,1000)
> (XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
> (XEN) cpufreq: del CPU14 (1,4000,1,4000)
> (XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
> (XEN) cpufreq: del CPU16 (1,ff0001,1,10000)
> (XEN) cpufreq: del CPU17 (1,fe0001,1,20000)
> (XEN) cpufreq: del CPU18 (1,fc0001,1,40000)
> (XEN) cpufreq: del CPU19 (1,f80001,1,80000)
> (XEN) cpufreq: del CPU20 (1,f00001,1,100000)
> (XEN) cpufreq: del CPU21 (1,e00001,1,200000)
> (XEN) cpufreq: del CPU22 (1,c00001,1,400000)
> (XEN) cpufreq: del CPU23 (1,800001,1,800000)
> (XEN) Broke affinity for irq 72
> (XEN) cpufreq: del CPU0 (1,1,1,1)
> (XEN) Entering ACPI S5 state.

So this, despite being the working example, already shows a very
odd arrangement: All odd numbered CPUs _and_ all even numbered
ones from 16 up share a domain. The typical case would be for all
sibling threads and/or cores to share a domain, iirc.

> (XEN) cpufreq: del CPU23 (1,800001,1,800000)
> (XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    23

So at the time the debug message gets printed, policy->cpus is
still valid. With hw_all set, next thing is the call
__cpufreq_governor(policy, CPUFREQ_GOV_STOP), which
ought to lead to cpufreq_governor_dbs()'s
"case CPUFREQ_GOV_STOP", which in turn calls dbs_timer_exit().
That function kills the timer that causes the crash. kill_timer(), in
turn, makes sure the timer being killed does not currently have
its handler executed on any CPU. IOW I can't spot any race, yet
there quite obviously is one in your case. All I can do at the
moment is hand you an extended debugging patch.

Jan

--- unstable.orig/xen/drivers/cpufreq/cpufreq.c 2017-09-12 12:39:58.310556379 
+0200
+++ unstable/xen/drivers/cpufreq/cpufreq.c      2018-03-01 16:26:55.071367162 
+0100
@@ -190,6 +190,7 @@ int cpufreq_add_cpu(unsigned int cpu)
             break;
         }
     }
+printk("CPU%u: dom%u (%u,%u)\n", cpu, dom, domexist, hw_all);//temp
 
     if (!domexist) {
         cpufreq_dom = xzalloc(struct cpufreq_dom);
@@ -352,6 +353,8 @@ int cpufreq_del_cpu(unsigned int cpu)
 
     /* for HW_ALL, stop gov for each core of the _PSD domain */
     /* for SW_ALL & SW_ANY, stop gov for the 1st core of the _PSD domain */
+printk("cpufreq: del CPU%u (%u,%lx,%lu,%lx)\n", cpu,//temp
+       hw_all, cpufreq_dom->map->bits[0], perf->domain_info.num_processors, 
policy->cpus->bits[0]);//temp
     if (hw_all || (cpumask_weight(cpufreq_dom->map) ==
                    perf->domain_info.num_processors))
         __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
--- unstable.orig/xen/drivers/cpufreq/cpufreq_ondemand.c        2017-09-12 
12:39:58.310556379 +0200
+++ unstable/xen/drivers/cpufreq/cpufreq_ondemand.c     2018-03-01 
16:21:41.829296179 +0100
@@ -179,6 +179,9 @@ static void do_dbs_timer(void *dbs)
     if (!dbs_info->enable)
         return;
 
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: check CPU%u\n", dbs_info->cpu);
+}
     dbs_check_cpu(dbs_info);
 
     set_timer(&per_cpu(dbs_timer, dbs_info->cpu),
@@ -218,6 +221,9 @@ int cpufreq_governor_dbs(struct cpufreq_
 
     switch (event) {
     case CPUFREQ_GOV_START:
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: start CPU%u [%pS]\n", cpu, __builtin_return_address(0));
+}
         if ((!cpu_online(cpu)) || (!policy->cur))
             return -EINVAL;
 
@@ -273,12 +279,18 @@ int cpufreq_governor_dbs(struct cpufreq_
         break;
 
     case CPUFREQ_GOV_STOP:
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: stop CPU%u (%u,%d)\n", cpu, this_dbs_info->cpu, 
this_dbs_info->enable);
+}
         if ( !this_dbs_info->enable )
             /* Already not enabled */
             break;
 
         dbs_timer_exit(this_dbs_info);
         dbs_enable--;
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: stopped CPU%u (%d,%u)\n", cpu, per_cpu(dbs_timer, cpu).status, 
per_cpu(dbs_timer, cpu).cpu);
+}
 
         break;
 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.