[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PATastic fun



On 25.02.2013 10:10, Stefan Bader wrote:
> On 25.02.2013 04:15, Liu, Jinsong wrote:
>> Konrad Rzeszutek Wilk wrote:
>>> On Fri, Feb 22, 2013 at 02:54:16PM +0100, Stefan Bader wrote:
>>>> Hi Konrad,
>>>
>>> Hey Stefan,
>>>>
>>>> here is another one from the hm-what? department:
>>>
>>> Heh - the really good-bug-hunting one. Lets also include Jinsong as
>>> he has been tracking a similar one with mcelog.
>>>>
>>>> Colin discovered that running the attached program with the fork
>>>> active (e.g. "./mmap-example -f 0x10000", the address can be that or
>>>> iomem) this triggers the following weird messages: 
>>>>
>>>> [ 6824.453724] mmap-example:3481 map pfn expected mapping type
>>>> write-back for [mem 0x00010000-0x00010fff], got uncached-minus
>>>> [ 6824.453776] ------------[ cut here ]------------
>>>> [ 6824.453796] WARNING: at
>>>> /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774
>>>> untrack_pfn+0xb8/0xd0() ... [ 6824.453920] Pid: 3481, comm:
>>>> mmap-example Tainted: GF 
>>>> 3.8.0-6-generic #13-Ubuntu
>>>> [ 6824.453926] Call Trace:
>>>> [ 6824.453944]  [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 6824.453954]  [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20
>>>> [ 6824.453963]  [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0
>>>> [ 6824.453975]  [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100
>>>> [ 6824.453985]  [<ffffffff81157459>] unmap_vmas+0x49/0x90
>>>> [ 6824.453995]  [<ffffffff8115f808>] exit_mmap+0x98/0x170
>>>> [ 6824.454007]  [<ffffffff810559a4>] mmput+0x64/0x100
>>>> [ 6824.454017]  [<ffffffff810560f5>] dup_mm+0x445/0x660
>>>> [ 6824.454027]  [<ffffffff81056d9f>]
>>>> copy_process.part.22+0xa5f/0x1510 [ 6824.454038] 
>>>> [<ffffffff81057931>] do_fork+0x91/0x350 [ 6824.454048] 
>>>> [<ffffffff81057c76>] sys_clone+0x16/0x20 [ 6824.454060] 
>>>> [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [ 6824.454069] 
>>>> [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f [ 6824.454076]
>>>> ---[ end trace 4918cdd0a4c9fea4 ]--- 
>>>>
>>>> I found that this is related to your bandaid patch
>>>>
>>>> commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1
>>>> Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>>>> Date:   Fri Feb 10 09:16:27 2012 -0500
>>>>
>>>>     xen/pat: Disable PAT support for now.
>>>>
>>>> I just do not understand how this happens. From the trace it seems
>>>> the fork 
>>>> fails when duplicating the VMAs (dup_mm calls mmput on failure). So
>>>> maybe the 
>>>> warning is just related to this. So primarily the question is how on
>>>> fork the _PAGE_PCD bit can become set? That and _PAGE_PWT are
>>>> cleared from the supported 
>>>> mask by the patch, so somehow I would think nothing should be able
>>>> to set it... 
>>>> But apparently not so.
>>>> Not sure it is a big deal since I never saw this in normal operation
>>>> and it 
>>>> seems to be ok when unapping before doing the fork. It is just plain
>>>> odd. 
>>>
>>> Jinsong mentioned that there is some oddity with the MTRR. Somehow the
>>> ranges are swapped or not correct. Jinsong, could you shed some light
>>> on what you have found so far?
>>>
>>
>> Yes, Sander once also reported a similar weird warning when start mcelog 
>> daemon, as attached.
>>
>> Basically, it occurs when mcelog user daemon start, 
>> do_fork
>>   --> copy_process
>>     --> dup_mm
>>       --> dup_mmap
>>         --> copy_page_range
>>           --> track_pfn_copy
>>             --> reserve_pfn_range

So that makes it clearer as this will do

reserve_memtype(...)
--> pat_x_mtrr_type
  --> mtrr_type_lookup
    --> __mtrr_type_lookup

And that can return -1/0xff in case of mtrr not being enabled/initialized. Which
is not the case (given there are no messages for it in dmesg). This is not equal
to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS.

It looks like the problem starts early in reserve_memtype:

        if (!pat_enabled) {
                /* This is identical to page table setting without PAT */
                if (new_type) {
                        if (req_type == _PAGE_CACHE_WC)
                                *new_type = _PAGE_CACHE_UC_MINUS;
                        else
                                *new_type = req_type & _PAGE_CACHE_MASK;
                }
                return 0;
        }

This would be what we want, but only clearing the PWT and PCD flags from the
supported flags is not changing pat_enabled (which is 1 when PAT support is
compiled into the kernel). Unfortunately the variable is local and since there
are not any messages about PAT in dmesg I would say pat_init() is not called
either. Which might be used to disable PAT support by clearing the CPU feature
flag.
Right now it seems the only work-around that message appearing is to user
"nopat" on the kernel command line.

-Stefan


>>               --> line 624: flags != want_flags
>> It comes from different memory types of page table (_PAGE_CACHE_WB) and mtrr 
>> (_PAGE_CACHE_UC_MINUS).
>>
>> However, why it get different memory types from page table and mtrr is still 
>> unclear, reproducing the bug is difficult and unstable.
>>
>> Thanks,
> 
> Ok, so this seems to take the same code paths. As for the test program, it 
> fails
> on duplicating some mmap on a fork. The test program does this all the time
> (except the backtrace warning which is warn_once).
> So you say, the UC- comes from the MTRR side... Hm, have to look at that.
> 




Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.