[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872



2011/4/10 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> Hi Konrad & Jeremy:
>
>             I think we finally located the missing patch for this commit.
>             We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=c97f681f138039425c87f35ea46a92385d81e70e
>             which is works.
>
>             We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=221c64dbf860d37f841f40893bddf8d804aa55bd
>             which server crashed.
>
>              Later I found the comments for this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
>
>             So It looks like this fix is not applied on 2.6.32.36, Could you
> take a look at this?
>
>             Many thanks.
>
> =====================================================
>>Hi Konrad & Jeremy:
>>
>>     I'd like to open this BUG in a new thread, since the old thread is too
>> long for easy read.
>>
>>     We recently want to upgrade our kernel to 2.6.32, but unfortunately,
>> we confront a kernel crash bug.
>>Our test case is simple, start 24 win2003 HVMS on our physical machine, and
>> each HVM reboot
>>every 15minutes. The kernel will crash in half an hour.(That is crash on VM
>> second starts).
>>
>>Our test go much further.
>>We test different kernel version.
>>2.6.32.10
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=d945b014ac5df9592c478bf9486d97e8914aab59
>>2.6.32.11
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27f948a3bf365a5bc3d56119637a177d41147815
>>2.6.32.12
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ba739f9abd3f659b907a824af1161926b420a2ce
>>2.6.32.13
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=f6fe6583b77a49b569eef1b66c3d761eec2e561b
>>2.6.32.15
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27ed1b0e0dae5f1d5da5c76451bc84cb529128bd
>>2.6.32.21
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=69e50db231723596ed8ef9275d0068d6697f466a
>>
>>There are basic three different result we met.
>>
>>i1) grant table issue
>>The host still function, but use xm  dmesg, we have abnormal log.
>>please refer to the attched log of grant table
>>
>>i2) kernel crash on a different place.
>>Host die during the test, after reboot, we can see nothing abnormal in
>> /var/log/messages
>>
>>i3) kernel BUG at arch/x86/xen/mmu.c:1872;
>>Host die during the test, after reboot, we see the crash log in messages,
>> refer to the attached log of 2.6.32.36
>>Summary of the test result, can be classified in two:
>>
>>1) 2.6.32.10
>>30 machines involved the test, and three has issue (i1), and two has issue
>> (i2), *no* issue (i3)
>>Other machines run tests successfully till now, more than 8 hours
>>
>>2)2.6.32.11 or later version.
>>Each version containers 10 machine for tests, and all machine crashed in
>> less than half an hour.
>>
>>Conclusion:
>>1) grant table issue exists in all kernel version
>>2) kernerl crash at different place may exist in all kernel versions, but
>> not happen so frequently, 2 out of 30
>>3) We observe the major difference of issue i3), from the test, it looks
>> like it is introduced between the version
>>2.6.32.10 and 2.6.32.11.
>>
>>Hope this help to locate the bug.
>>Many thanks.
>>
>>
>

Hi,

Sorry, since this mmu related BUG has been troubled me for very
long... I really want to "kill" this BUG but my knowledge in kernel
hacking and/or xen is very limited.

While waiting for Jeremy or Konrad or others ...

Many thanks for spending time to track down this mmu related BUG.  I
have backported the commit from
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
to 2.6.32.36 PVOPS kernel and patch attached.  I won't know whether
did I backport it correctly nor does it affects anything.  I am
currently testing the 2.6.32.36 PVOPS kernel with this patch applied
and also unset CONFIG_DEBUG_PAGEALLOC.  Currently running testcrash.sh
loop 1000 as I am unable to reproduce this mmu BUG 1872 in
testcrash.sh loop 100.  Please note that when CONFIG_DEBUG_PAGEALLOC
is unset, I can reproduce this mmu BUG 1872 easily within <50
testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36
kernel.  Now test with this backport patch to see whether I can
reproduce this mmu BUG... ...

Kindest regards,
Giam Teck Choon

Attachment: vmalloc__eagerly_clear_ptes_on_vunmap.patch
Description: Text Data

Attachment: testcrash.sh
Description: Bourne shell script

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.