[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split



Andre (and Juergen), can you try again with the attached patch?

What the patch basically does is try to make "cpu_disable_scheduler()"
do what it seems to say it does. :-)  Namely, the various
scheduler-related interrutps (both per-cpu ticks and the master tick)
is a part of the scheduler, so disable them before doing anything, and
don't enable them until the cpu is really ready to go again.

To be precise:
* cpu_disable_scheduler() disables ticks
* scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
and does it after inserting the idle vcpu
* Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
stop tickers
 + Call tick_{resume,suspend} in cpu_{up,down}, respectively
* Modify credit1's tick_{suspend,resume} to handle the master ticker as well.

With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
on one pcpu), I can perform thousands of operations successfully.

(NB this is not ready for application yet, I just wanted to check to
see if it fixes Andre's problem)

 -George

On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
> Okay, I have some more data.
>
> I activated cpupool_dprintk() and included checks in sched_credit.c to
> test for weight inconsistencies. To reduce race possibilities I've added
> my patch to execute cpu assigning/unassigning always in a tasklet on the
> cpu to be moved.
>
> Here is the result:
>
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
> (XEN) cpupool_assign_cpu(cpu=1) ret 0
> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
> (XEN) cpupool_assign_cpu(cpu=4) ret 0
> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
> (XEN) Xen BUG at sched_credit.c:570
> (XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    4
> (XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx: 0000000000000000
> (XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi: ffff82c4802542e8
> (XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8:  0000000000000004
> (XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11: 0000000000000001
> (XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14: ffff831002ad5e40
> (XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
> (XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000
> (XEN)    0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651
> (XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204
> (XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e
> (XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100
> (XEN)    ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880
> (XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647
> (XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20
> (XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2
> (XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002
> (XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50
> (XEN)    0000000000000246 0000000000000032 0000000000000000 00000000ffffffff
> (XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848
> (XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033
> (XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000004
> (XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
> (XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
> (XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
> (XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
> (XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 4:
> (XEN) Xen BUG at sched_credit.c:570
> (XEN) ****************************************
>
> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON
> triggered in csched_acct() is a logical result of this.
>
> How this can happen I don't know yet.
> Anyone any idea? I'll keep searching...
>
>
> Juergen
>
> On 02/15/11 08:22, Juergen Gross wrote:
>>
>> On 02/14/11 18:57, George Dunlap wrote:
>>>
>>> The good news is, I've managed to reproduce this on my local test
>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
>>> attached script. It's time to go home now, but I should be able to
>>> dig something up tomorrow.
>>>
>>> To use the script:
>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1"
>>> * You can modify elements by adding "arg=val" as arguments.
>>> * Arguments are:
>>> + dryrun={true,false} Do the work, but don't actually execute any xl
>>> arguments. Default false.
>>> + left: Number commands to execute. Default 10.
>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is
>>> 8 cpus).
>>> + verbose={true,false} Print what you're doing. Default is true.
>>>
>>> The script sometimes attempts to remove the last cpu from cpupool0; in
>>> this case, libxl will print an error. If the script gets an error
>>> under that condition, it will ignore it; under any other condition, it
>>> will print diagnostic information.
>>>
>>> What finally crashed it for me was this command:
>>> # ./cpupool-test.sh verbose=false left=1000
>>
>> Nice!
>> With your script I finally managed to get the error, too. On my box (2
>> sockets
>> a 6 cores) I had to use
>>
>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>
>> to trigger it.
>> Looking for more data now...
>>
>>
>> Juergen
>>
>>>
>>> -George
>>>
>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>> Przywara<andre.przywara@xxxxxxx> wrote:
>>>>
>>>> Juergen Gross wrote:
>>>>>
>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>
>>>>>> Andre Przywara wrote:
>>>>>>>
>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>>
>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>
>>>>>>>>> Andre, George,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What seems to be interesting: I think the problem did always occur
>>>>>>>>> when
>>>>>>>>> a new cpupool was created and the first cpu was moved to it.
>>>>>>>>>
>>>>>>>>> I think my previous assumption regarding the master_ticker was not
>>>>>>>>> too bad.
>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming
>>>>>>>>> active
>>>>>>>>> before the scheduler is really initialized properly. This could
>>>>>>>>> happen, if
>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>>>
>>>>>>>>> The solution should be to activate the timers only if the
>>>>>>>>> scheduler is
>>>>>>>>> ready for them.
>>>>>>>>>
>>>>>>>>> George, do you think the master_ticker should be stopped in
>>>>>>>>> suspend_ticker
>>>>>>>>> as well? I still see potential problems for entering deep C-States.
>>>>>>>>> I think
>>>>>>>>> I'll prepare a patch which will keep the master_ticker active
>>>>>>>>> for the
>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case.
>>>>>>>>
>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine
>>>>>>>> without any
>>>>>>>> problems.
>>>>>>>> Andre, could you give it a try?
>>>>>>>
>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made
>>>>>>> sure
>>>>>>> I booted the right kernel. Sorry.
>>>>>>> The idea with the race between the timer and the state changing
>>>>>>> sounded very appealing, actually that was suspicious to me from the
>>>>>>> beginning.
>>>>>>>
>>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON
>>>>>>> to see in which situation we are when the bug triggers.
>>>>>>
>>>>>> OK, here is a first try of this, the patch iterates over all CPU pools
>>>>>> and outputs some data if the BUG_ON
>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition
>>>>>> triggers:
>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
>>>>>> fffffffc003f
>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>> ....
>>>>>> The masks look proper (6 cores per node), the bug triggers when the
>>>>>> first CPU is about to be(?) inserted.
>>>>>
>>>>> Sure? I'm missing the cpu with mask 2000.
>>>>> I'll try to reproduce the problem on a larger machine here (24 cores, 4
>>>>> numa
>>>>> nodes).
>>>>> Andre, can you give me your xen boot parameters? Which xen changeset
>>>>> are
>>>>> you
>>>>> running, and do you have any additional patches in use?
>>>>
>>>> The grub lines:
>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>
>>>> All of my experiments are use c/s 22858 as a base.
>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or
>>>> G34),
>>>> you should add the following patch (removing the line)
>>>> --- a/xen/arch/x86/traps.c
>>>> +++ b/xen/arch/x86/traps.c
>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>> break;
>>>> case 5: /* MONITOR/MWAIT */
>>>>
>>>> This is not necessary (in fact that reverts my patch c/s 22815), but
>>>> raises
>>>> the probability to trigger the bug, probably because it increases the
>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0,
>>>> try to
>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>>>>
>>>> Good luck ;-)
>>>> Andre.
>>>>
>>>> --
>>>> Andre Przywara
>>>> AMD-OSRC (Dresden)
>>>> Tel: x29712
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>
>>
>
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@xxxxxxxxxxxxxx
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

Attachment: cpupools-tick-rearrange.diff
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.