[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1
Since it's already rc7: "Give me a subtle ping, Vasili. One subtle ping only, please." On 04/01/18 21:02, Sander Eikelenboom wrote: > On 04/01/18 12:44, Juergen Gross wrote: >> On 04/01/18 11:17, Sander Eikelenboom wrote: >>> Hi Boris / Juergen, >>> >>> First of all best wishes for a quite turbulent starting new year. >>> >>> Now the holidays are over I finally gotten to test a linux 4.15-rc6 kernel >>> and experienced a crash in early dom0 boot on my system (AMD phenom x6). >>> >>> I tested some earlier linux 4.15 rc's but experienced crashes then as well, >>> but didn't have time to setup serial console to send them in >>> (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's could >>> be it). >>> >>> But since that patch went in before 4.15 rc6, that doesn't seem to be the >>> issue. >>> So it could be that the culprit went in pretty earlier in the 4.15 cycle. >>> >>> The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel on >>> xen-unstable. >>> >>> Hopefully you have a pointer to what is wrong, if not i can try to do a >>> bisect. >> >> A bisect would be very welcome. > > Hi Juergen / Boris / Pavel, > > Bisection result is: > > a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit > commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b > Author: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > Date: Wed Nov 15 17:36:31 2017 -0800 > > mm: zero reserved and unavailable struct pages > > Some memory is reserved but unavailable: not present in memblock.memory > (because not backed by physical pages), but present in memblock.reserved. > Such memory has backing struct pages, but they are not initialized by > going through __init_single_page(). > > In some cases these struct pages are accessed even if they do not > contain any data. One example is page_to_pfn() might access page->flags > if this is where section information is stored (CONFIG_SPARSEMEM, > SECTION_IN_PAGE_FLAGS). > > One example of such memory: trim_low_memory_range() unconditionally > reserves from pfn 0, but e820__memblock_setup() might provide the > exiting memory from pfn 1 (i.e. KVM). > > Since struct pages are zeroed in __init_single_page(), and not during > allocation time, we must zero such struct pages explicitly. > > The patch involves adding a new memblock iterator: > for_each_resv_unavail_range(i, p_start, p_end) > > Which iterates through reserved && !memory lists, and we zero struct pages > explicitly by calling mm_zero_struct_page(). > > === > > Here is more detailed example of problem that this patch is addressing: > > Run tested on qemu with the following arguments: > > -enable-kvm -cpu kvm64 -m 512 -smp 2 > > This patch reports that there are 98 unavailable pages. > > They are: pfn 0 and pfns in range [159, 255]. > > Note, trim_low_memory_range() reserves only pfns in range [0, 15], it does > not reserve [159, 255] ones. > > e820__memblock_setup() reports linux that the following physical ranges > are > available: > [1 , 158] > [256, 130783] > > Notice, that exactly unavailable pfns are missing! > > Now, lets check what we have in zone 0: [1, 131039] > > pfn 0, is not part of the zone, but pfns [1, 158], are. > > However, the bigger problem we have if we do not initialize these struct > pages is with memory hotplug. Because, that path operates at 2M > boundaries (section_nr). And checks if 2M range of pages is hot > removable. It starts with first pfn from zone, rounds it down to 2M > boundary (sturct pages are allocated at 2M boundaries when vmemmap is > created), and checks if that section is hot removable. In this case > start with pfn 1 and convert it down to pfn 0. Later pfn is converted > to struct page, and some fields are checked. Now, if we do not zero > struct pages, we get unpredictable results. > > In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all > vmemmap memory to ones, the following panic is observed with kernel test > without this patch applied: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: is_pageblock_removable_nolock+0x35/0x90 > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT > ... > task: ffff88001f4e2900 task.stack: ffffc90000314000 > RIP: 0010:is_pageblock_removable_nolock+0x35/0x90 > Call Trace: > ? is_mem_section_removable+0x5a/0xd0 > show_mem_removable+0x6b/0xa0 > dev_attr_show+0x1b/0x50 > sysfs_kf_seq_show+0xa1/0x100 > kernfs_seq_show+0x22/0x30 > seq_read+0x1ac/0x3a0 > kernfs_fop_read+0x36/0x190 > ? security_file_permission+0x90/0xb0 > __vfs_read+0x16/0x30 > vfs_read+0x81/0x130 > SyS_read+0x44/0xa0 > entry_SYSCALL_64_fastpath+0x1f/0xbd > > Link: > http://lkml.kernel.org/r/20171013173214.27300-7-pasha.tatashin@xxxxxxxxxx > Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > Reviewed-by: Steven Sistare <steven.sistare@xxxxxxxxxx> > Reviewed-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> > Reviewed-by: Bob Picco <bob.picco@xxxxxxxxxx> > Tested-by: Bob Picco <bob.picco@xxxxxxxxxx> > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > Cc: Alexander Potapenko <glider@xxxxxxxxxx> > Cc: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx> > Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Christian Borntraeger <borntraeger@xxxxxxxxxx> > Cc: David S. Miller <davem@xxxxxxxxxxxxx> > Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Sam Ravnborg <sam@xxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Will Deacon <will.deacon@xxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > :040000 040000 b0422cb4f5ef60f5bc7f0686d135c869680c603d > 51ef20afe641afceaf5530b83b4f1b9a51563939 M include > :040000 040000 55be7a5dd879578dc3f88bec059bcc392e3f1a1c > b4c9f81df05629bb034b6d0bdc0454579f2986fe M mm > > > -- > Sander > >> >> Juergen >> >>> >>> -- >>> Sander >>> >>> Attached: .config and full serial log >>> >>> 0.000000] ACPI: Early table checksum verification disabled >>> [ 0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM) >>> [ 0.000000] ACPI: RSDT 0x00000000C7F90000 000048 (v01 MSI OEMSLIC >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: FACP 0x00000000C7F90200 000084 (v01 7640MS A7640100 >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: DSDT 0x00000000C7F905E0 009427 (v01 A7640 A7640100 >>> 00000100 INTL 20051117) >>> [ 0.000000] ACPI: FACS 0x00000000C7F9E000 000040 >>> [ 0.000000] ACPI: APIC 0x00000000C7F90390 000088 (v01 7640MS A7640100 >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: MCFG 0x00000000C7F90420 00003C (v01 7640MS OEMMCFG >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: SLIC 0x00000000C7F90460 000176 (v01 MSI OEMSLIC >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: OEMB 0x00000000C7F9E040 000072 (v01 7640MS A7640100 >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: SRAT 0x00000000C7F9A5E0 000108 (v03 AMD FAM_F_10 >>> 00000002 AMD 00000001) >>> [ 0.000000] ACPI: HPET 0x00000000C7F9A6F0 000038 (v01 7640MS OEMHPET >>> 20100913 MSFT 00000097) >>> [ 0.000000] ACPI: IVRS 0x00000000C7F9A730 000110 (v01 AMD RD890S >>> 00202031 AMD 00000000) >>> [ 0.000000] ACPI: SSDT 0x00000000C7F9A840 000DA4 (v01 A M I POWERNOW >>> 00000001 AMD 00000001) >>> [ 0.000000] ACPI: Local APIC address 0xfee00000 >>> [ 0.000000] Setting APIC routing to Xen PV. >>> [ 0.000000] NUMA turned off >>> [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007fffffff] >>> [ 0.000000] NODE_DATA(0) allocated [mem 0x7fc15000-0x7fc1efff] >>> [ 0.000000] tsc: Fast TSC calibration using PIT >>> [ 0.000000] Zone ranges: >>> [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] >>> [ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007fffffff] >>> [ 0.000000] Normal empty >>> [ 0.000000] Movable zone start for each node >>> [ 0.000000] Early memory node ranges >>> [ 0.000000] node 0: [mem 0x0000000000001000-0x0000000000095fff] >>> [ 0.000000] node 0: [mem 0x0000000000100000-0x000000007fffffff] >>> [ 0.000000] Initmem setup node 0 [mem >>> 0x0000000000001000-0x000000007fffffff] >>> [ 0.000000] On node 0 totalpages: 524181 >>> [ 0.000000] DMA zone: 64 pages used for memmap >>> [ 0.000000] DMA zone: 21 pages reserved >>> [ 0.000000] DMA zone: 3989 pages, LIFO batch:0 >>> [ 0.000000] DMA32 zone: 8128 pages used for memmap >>> [ 0.000000] DMA32 zone: 520192 pages, LIFO batch:31 >>> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at >>> (null) >>> [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1 >>> [ 0.000000] PGD 0 P4D 0 >>> [ 0.000000] Oops: 0002 [#1] SMP >>> [ 0.000000] Modules linked in: >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>> 4.15.0-rc6-20180104-linus-doflr+ #1 >>> [ 0.000000] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >>> V1.8B1 09/13/2010 >>> [ 0.000000] RIP: e030:zero_resv_unavail+0x8e/0xe1 >>> [ 0.000000] RSP: e02b:ffffffff82803d68 EFLAGS: 00010006 >>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000000001 RCX: >>> 0000000000000010 >>> [ 0.000000] RDX: 000000000007ffff RSI: 0000000000000100 RDI: >>> ffffea0002000000 >>> [ 0.000000] RBP: ffffffff82803d70 R08: ffffea0002000000 R09: >>> 0000000000000002 >>> [ 0.000000] R10: 0000000000000002 R11: 0000000000000003 R12: >>> ffffea0000000000 >>> [ 0.000000] R13: 0000000000000000 R14: ffffffff82803f20 R15: >>> 0000000000000000 >>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff82e16000(0000) >>> knlGS:0000000000000000 >>> [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.000000] CR2: 0000000000000000 CR3: 0000000002823000 CR4: >>> 0000000000000660 >>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: >>> 0000000000000000 >>> [ 0.000000] Call Trace: >>> [ 0.000000] ? free_area_init_nodes+0x690/0x69f >>> [ 0.000000] ? zone_sizes_init+0x4b/0x50 >>> [ 0.000000] ? xen_pagetable_init+0x13/0x43f >>> [ 0.000000] ? memblock_find_dma_reserve+0x141/0x15b >>> [ 0.000000] ? memblock_find_dma_reserve+0x150/0x15b >>> [ 0.000000] ? numa_init+0x43c/0x453 >>> [ 0.000000] ? setup_arch+0x7a0/0x87f >>> [ 0.000000] ? start_kernel+0x58/0x3a8 >>> [ 0.000000] ? iommu_shutdown_noop+0x10/0x10 >>> [ 0.000000] ? xen_start_kernel+0x528/0x534 >>> [ 0.000000] Code: da 49 c1 e0 06 4d 01 e0 48 8b 44 24 08 48 8d 0c 1a 48 >>> 05 ff 0f 00 00 48 c1 e8 0c 48 39 c8 76 16 4c 89 c7 b9 10 00 00 00 44 89 e8 >>> <f3> ab 48 ff c3 49 83 c0 40 eb d2 6a 00 55 31 d2 49 c7 c0 90 78 >>> [ 0.000000] RIP: zero_resv_unavail+0x8e/0xe1 RSP: ffffffff82803d68 >>> [ 0.000000] CR2: 0000000000000000 >>> [ 0.000000] ---[ end trace b788f32e38f6de39 ]--- >>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>> (XEN) [2018-01-04 09:52:49.218] Hardware Dom0 crashed: rebooting machine in >>> 5 seconds. >>> >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |