WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Wed, 16 Feb 2011 13:54:41 +0000
Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Wed, 16 Feb 2011 05:55:26 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=kIB+936zkc+fjVq7oJM9Ef/J1Q01B9vjbkboI6HYLJ4=; b=bU7LKPXJJnWr39SP1vZvaMBDPLPJtJJySXiIQGVmdz1fx2lzEIpV/tJXtb1EgAmadj lCC/TEoHWewS7StBxf47uY8gyVKeKKxpw7xoF74Owgs8I/SkEfJrRCHtyQdfU+ixeHGm 7tSJroDrYW23ObYBIsLJUhE2IzABNeqT6a8dY=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=jAGuZwq+1O5PtRrVeGPFbsB0iZ6PQs5P2uurMSN+3s2WiPBLuKy3wSGisUj+7d6R1h wKPo3cjAw6+0DdZJV4TzvCLGtt79ItywJb+xNrRTVQwIfv4Agg9RWkoSTnMjmHjCwudf g0jc8KWmreNMD/4MJCgAtCFCqglyTuoeSFMLs=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D5B9D2B.107@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4D41FD3A.5090506@xxxxxxx> <4D4974D1.1080503@xxxxxxxxxxxxxx> <201102021701.05665.stephan.diestelhorst@xxxxxxx> <4D4A43B7.5040707@xxxxxxxxxxxxxx> <4D4A72D8.3020502@xxxxxxxxxxxxxx> <4D4C08B6.30600@xxxxxxx> <4D4FE7E2.9070605@xxxxxxx> <4D4FF452.6060508@xxxxxxxxxxxxxx> <AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@xxxxxxxxxxxxxx> <4D50D80F.9000007@xxxxxxxxxxxxxx> <AANLkTinKJUAXhiXpKui_XX8XCD6T5fmzNARwHE6Fjafv@xxxxxxxxxxxxxx> <AANLkTinP0z9GynF1RFd8RwzWuqvxYdb+UBE+7xKpX6D4@xxxxxxxxxxxxxx> <4D517051.10402@xxxxxxx> <AANLkTi=MiELBnPFvb6-jzVth+T7aKxP5JMFhVh3Crdmo@xxxxxxxxxxxxxx> <AANLkTikgGNz=imS1xRVVjntY5P=+MuT_Qsb=-h3QHajY@xxxxxxxxxxxxxx> <4D529BD9.5050200@xxxxxxx> <4D52A2CD.9090507@xxxxxxxxxxxxxx> <4D5388DF.8040900@xxxxxxxxxxxxxx> <4D53AF27.7030909@xxxxxxx> <4D53F3BC.4070807@xxxxxxx> <4D54D478.9000402@xxxxxxxxxxxxxx> <4D54E79E.3000800@xxxxxxx> <AANLkTimkRAHtM4CoTskQ7w6B-8Pis4B2+k7=frxM3oyW@xxxxxxxxxxxxxx> <4D5A29C0.4050702@xxxxxxxxxxxxxx> <4D5B9D2B.107@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Andre (and Juergen), can you try again with the attached patch?

What the patch basically does is try to make "cpu_disable_scheduler()"
do what it seems to say it does. :-)  Namely, the various
scheduler-related interrutps (both per-cpu ticks and the master tick)
is a part of the scheduler, so disable them before doing anything, and
don't enable them until the cpu is really ready to go again.

To be precise:
* cpu_disable_scheduler() disables ticks
* scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
and does it after inserting the idle vcpu
* Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
stop tickers
 + Call tick_{resume,suspend} in cpu_{up,down}, respectively
* Modify credit1's tick_{suspend,resume} to handle the master ticker as well.

With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
on one pcpu), I can perform thousands of operations successfully.

(NB this is not ready for application yet, I just wanted to check to
see if it fixes Andre's problem)

 -George

On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
> Okay, I have some more data.
>
> I activated cpupool_dprintk() and included checks in sched_credit.c to
> test for weight inconsistencies. To reduce race possibilities I've added
> my patch to execute cpu assigning/unassigning always in a tasklet on the
> cpu to be moved.
>
> Here is the result:
>
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
> (XEN) cpupool_assign_cpu(cpu=1) ret 0
> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
> (XEN) cpupool_assign_cpu(cpu=4) ret 0
> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
> (XEN) Xen BUG at sched_credit.c:570
> (XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    4
> (XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx: 0000000000000000
> (XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi: ffff82c4802542e8
> (XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8:  0000000000000004
> (XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11: 0000000000000001
> (XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14: ffff831002ad5e40
> (XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
> (XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000
> (XEN)    0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651
> (XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204
> (XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e
> (XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100
> (XEN)    ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880
> (XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647
> (XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20
> (XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2
> (XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002
> (XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50
> (XEN)    0000000000000246 0000000000000032 0000000000000000 00000000ffffffff
> (XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848
> (XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033
> (XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000004
> (XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
> (XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
> (XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
> (XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
> (XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 4:
> (XEN) Xen BUG at sched_credit.c:570
> (XEN) ****************************************
>
> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON
> triggered in csched_acct() is a logical result of this.
>
> How this can happen I don't know yet.
> Anyone any idea? I'll keep searching...
>
>
> Juergen
>
> On 02/15/11 08:22, Juergen Gross wrote:
>>
>> On 02/14/11 18:57, George Dunlap wrote:
>>>
>>> The good news is, I've managed to reproduce this on my local test
>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
>>> attached script. It's time to go home now, but I should be able to
>>> dig something up tomorrow.
>>>
>>> To use the script:
>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1"
>>> * You can modify elements by adding "arg=val" as arguments.
>>> * Arguments are:
>>> + dryrun={true,false} Do the work, but don't actually execute any xl
>>> arguments. Default false.
>>> + left: Number commands to execute. Default 10.
>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is
>>> 8 cpus).
>>> + verbose={true,false} Print what you're doing. Default is true.
>>>
>>> The script sometimes attempts to remove the last cpu from cpupool0; in
>>> this case, libxl will print an error. If the script gets an error
>>> under that condition, it will ignore it; under any other condition, it
>>> will print diagnostic information.
>>>
>>> What finally crashed it for me was this command:
>>> # ./cpupool-test.sh verbose=false left=1000
>>
>> Nice!
>> With your script I finally managed to get the error, too. On my box (2
>> sockets
>> a 6 cores) I had to use
>>
>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>
>> to trigger it.
>> Looking for more data now...
>>
>>
>> Juergen
>>
>>>
>>> -George
>>>
>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>> Przywara<andre.przywara@xxxxxxx> wrote:
>>>>
>>>> Juergen Gross wrote:
>>>>>
>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>
>>>>>> Andre Przywara wrote:
>>>>>>>
>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>>
>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>
>>>>>>>>> Andre, George,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What seems to be interesting: I think the problem did always occur
>>>>>>>>> when
>>>>>>>>> a new cpupool was created and the first cpu was moved to it.
>>>>>>>>>
>>>>>>>>> I think my previous assumption regarding the master_ticker was not
>>>>>>>>> too bad.
>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming
>>>>>>>>> active
>>>>>>>>> before the scheduler is really initialized properly. This could
>>>>>>>>> happen, if
>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>>>
>>>>>>>>> The solution should be to activate the timers only if the
>>>>>>>>> scheduler is
>>>>>>>>> ready for them.
>>>>>>>>>
>>>>>>>>> George, do you think the master_ticker should be stopped in
>>>>>>>>> suspend_ticker
>>>>>>>>> as well? I still see potential problems for entering deep C-States.
>>>>>>>>> I think
>>>>>>>>> I'll prepare a patch which will keep the master_ticker active
>>>>>>>>> for the
>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case.
>>>>>>>>
>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine
>>>>>>>> without any
>>>>>>>> problems.
>>>>>>>> Andre, could you give it a try?
>>>>>>>
>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made
>>>>>>> sure
>>>>>>> I booted the right kernel. Sorry.
>>>>>>> The idea with the race between the timer and the state changing
>>>>>>> sounded very appealing, actually that was suspicious to me from the
>>>>>>> beginning.
>>>>>>>
>>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON
>>>>>>> to see in which situation we are when the bug triggers.
>>>>>>
>>>>>> OK, here is a first try of this, the patch iterates over all CPU pools
>>>>>> and outputs some data if the BUG_ON
>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition
>>>>>> triggers:
>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
>>>>>> fffffffc003f
>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>> ....
>>>>>> The masks look proper (6 cores per node), the bug triggers when the
>>>>>> first CPU is about to be(?) inserted.
>>>>>
>>>>> Sure? I'm missing the cpu with mask 2000.
>>>>> I'll try to reproduce the problem on a larger machine here (24 cores, 4
>>>>> numa
>>>>> nodes).
>>>>> Andre, can you give me your xen boot parameters? Which xen changeset
>>>>> are
>>>>> you
>>>>> running, and do you have any additional patches in use?
>>>>
>>>> The grub lines:
>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>
>>>> All of my experiments are use c/s 22858 as a base.
>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or
>>>> G34),
>>>> you should add the following patch (removing the line)
>>>> --- a/xen/arch/x86/traps.c
>>>> +++ b/xen/arch/x86/traps.c
>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>> break;
>>>> case 5: /* MONITOR/MWAIT */
>>>>
>>>> This is not necessary (in fact that reverts my patch c/s 22815), but
>>>> raises
>>>> the probability to trigger the bug, probably because it increases the
>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0,
>>>> try to
>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>>>>
>>>> Good luck ;-)
>>>> Andre.
>>>>
>>>> --
>>>> Andre Przywara
>>>> AMD-OSRC (Dresden)
>>>> Tel: x29712
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>
>>
>
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@xxxxxxxxxxxxxx
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

Attachment: cpupools-tick-rearrange.diff
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>