|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PATastic fun
On Tue, Feb 26, 2013 at 04:32:37PM +0100, Stefan Bader wrote:
> On 25.02.2013 10:10, Stefan Bader wrote:
> > On 25.02.2013 04:15, Liu, Jinsong wrote:
> >> Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Feb 22, 2013 at 02:54:16PM +0100, Stefan Bader wrote:
> >>>> Hi Konrad,
> >>>
> >>> Hey Stefan,
> >>>>
> >>>> here is another one from the hm-what? department:
> >>>
> >>> Heh - the really good-bug-hunting one. Lets also include Jinsong as
> >>> he has been tracking a similar one with mcelog.
> >>>>
> >>>> Colin discovered that running the attached program with the fork
> >>>> active (e.g. "./mmap-example -f 0x10000", the address can be that or
> >>>> iomem) this triggers the following weird messages:
> >>>>
> >>>> [ 6824.453724] mmap-example:3481 map pfn expected mapping type
> >>>> write-back for [mem 0x00010000-0x00010fff], got uncached-minus
> >>>> [ 6824.453776] ------------[ cut here ]------------
> >>>> [ 6824.453796] WARNING: at
> >>>> /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774
> >>>> untrack_pfn+0xb8/0xd0() ... [ 6824.453920] Pid: 3481, comm:
> >>>> mmap-example Tainted: GF
> >>>> 3.8.0-6-generic #13-Ubuntu
> >>>> [ 6824.453926] Call Trace:
> >>>> [ 6824.453944] [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0
> >>>> [ 6824.453954] [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20
> >>>> [ 6824.453963] [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0
> >>>> [ 6824.453975] [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100
> >>>> [ 6824.453985] [<ffffffff81157459>] unmap_vmas+0x49/0x90
> >>>> [ 6824.453995] [<ffffffff8115f808>] exit_mmap+0x98/0x170
> >>>> [ 6824.454007] [<ffffffff810559a4>] mmput+0x64/0x100
> >>>> [ 6824.454017] [<ffffffff810560f5>] dup_mm+0x445/0x660
> >>>> [ 6824.454027] [<ffffffff81056d9f>]
> >>>> copy_process.part.22+0xa5f/0x1510 [ 6824.454038]
> >>>> [<ffffffff81057931>] do_fork+0x91/0x350 [ 6824.454048]
> >>>> [<ffffffff81057c76>] sys_clone+0x16/0x20 [ 6824.454060]
> >>>> [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [ 6824.454069]
> >>>> [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f [ 6824.454076]
> >>>> ---[ end trace 4918cdd0a4c9fea4 ]---
> >>>>
> >>>> I found that this is related to your bandaid patch
> >>>>
> >>>> commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> >>>> Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> >>>> Date: Fri Feb 10 09:16:27 2012 -0500
> >>>>
> >>>> xen/pat: Disable PAT support for now.
> >>>>
> >>>> I just do not understand how this happens. From the trace it seems
> >>>> the fork
> >>>> fails when duplicating the VMAs (dup_mm calls mmput on failure). So
> >>>> maybe the
> >>>> warning is just related to this. So primarily the question is how on
> >>>> fork the _PAGE_PCD bit can become set? That and _PAGE_PWT are
> >>>> cleared from the supported
> >>>> mask by the patch, so somehow I would think nothing should be able
> >>>> to set it...
> >>>> But apparently not so.
> >>>> Not sure it is a big deal since I never saw this in normal operation
> >>>> and it
> >>>> seems to be ok when unapping before doing the fork. It is just plain
> >>>> odd.
> >>>
> >>> Jinsong mentioned that there is some oddity with the MTRR. Somehow the
> >>> ranges are swapped or not correct. Jinsong, could you shed some light
> >>> on what you have found so far?
> >>>
> >>
> >> Yes, Sander once also reported a similar weird warning when start mcelog
> >> daemon, as attached.
> >>
> >> Basically, it occurs when mcelog user daemon start,
> >> do_fork
> >> --> copy_process
> >> --> dup_mm
> >> --> dup_mmap
> >> --> copy_page_range
> >> --> track_pfn_copy
> >> --> reserve_pfn_range
>
> So that makes it clearer as this will do
>
> reserve_memtype(...)
> --> pat_x_mtrr_type
> --> mtrr_type_lookup
> --> __mtrr_type_lookup
>
> And that can return -1/0xff in case of mtrr not being enabled/initialized.
> Which
> is not the case (given there are no messages for it in dmesg). This is not
> equal
> to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS.
>
> It looks like the problem starts early in reserve_memtype:
>
> if (!pat_enabled) {
> /* This is identical to page table setting without PAT */
> if (new_type) {
> if (req_type == _PAGE_CACHE_WC)
> *new_type = _PAGE_CACHE_UC_MINUS;
> else
> *new_type = req_type & _PAGE_CACHE_MASK;
> }
> return 0;
> }
>
> This would be what we want, but only clearing the PWT and PCD flags from the
> supported flags is not changing pat_enabled (which is 1 when PAT support is
> compiled into the kernel). Unfortunately the variable is local and since there
> are not any messages about PAT in dmesg I would say pat_init() is not called
> either. Which might be used to disable PAT support by clearing the CPU feature
> flag.
Hm, so:
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 39928d1..9ac70c5 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -379,6 +379,7 @@ static void __init xen_init_cpuid_mask(void)
cpuid_leaf1_edx_mask =
~((1 << X86_FEATURE_MTRR) | /* disable MTRR */
+ (1 << X86_FEATURE_PAT) | /* disable PAT */
(1 << X86_FEATURE_ACC)); /* thermal monitoring */
if (!xen_initial_domain())
should do it right? Let me check on the troublesome machine I saw
it.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |