[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Scheduler regression in 4.7
On 11/08/16 14:24, George Dunlap wrote: > On 11/08/16 12:35, Andrew Cooper wrote: >> Hello, >> >> XenServer testing has discovered a regression from recent changes in >> staging-4.7. >> >> The actual cause is _csched_cpu_pick() falling over LIST_POISON, which >> happened to occur at the same time as a domain was shutting down. The >> instruction in question is `mov 0x10(%rax),%rax` which looks like >> reverse list traversal. > I don't see in sched_credit.c:_csched_cpu_pick() where any list > traversal happens. The instruction above could easily be any pointer > dereference (although you'd noramlly expect pointers to be either valid > or NULL). > > Could you use line2addr or objdump -dl to get a better idea where the > #GP is happening? addr2line -e xen-syms-4.7.0-xs127493 ffff82d08012944f /obj/RPM_BUILD_DIRECTORY/xen-4.7.0/xen/common/sched_credit.c:775 (discriminator 1) It will be IS_RUNQ_IDLE() which is the problem. For linked lists, the pointers are deliberately poisoned when list elements are deleted, to catch bugs like this. *** include/asm-x86/config.h: <global>[88] #define LIST_POISON2 ((void *)0x0200200200200200UL) but it is sufficiently recognisable that could spot it in %rax before disassembling the faulting instruction. ~Andrew > > -George > >> The regression is across the changes >> >> xen-4.7/xen$ git lg d37c2b9^..f2160ba >> * f2160ba - x86/mmcfg: Fix initalisation of variables in >> pci_mmcfg_nvidia_mcp55() (6 days ago) <Andrew Cooper> >> * 471a151 - xen: Remove buggy initial placement algorithm (6 days ago) >> <George Dunlap> >> * c732d3c - xen: Have schedulers revise initial placement (6 days ago) >> <George Dunlap> >> * d37c2b9 - x86/EFI + Live Patch: avoid symbol address truncation (6 >> days ago) <Jan Beulich> >> >> and is almost certainly c732d3c. >> >> The log is below, although being a non-debug build, has mostly stack >> rubble in the stack trace. >> >> ~Andrew >> >> (XEN) [ 3315.431878] ----[ Xen-4.7.0-xs127546 x86_64 debug=n Not >> tainted ]---- >> (XEN) [ 3315.431884] CPU: 3 >> (XEN) [ 3315.431888] RIP: e008:[<ffff82d08012944f>] >> sched_credit.c#_csched_cpu_pick+0x1af/0x549 >> (XEN) [ 3315.431900] RFLAGS: 0000000000010206 CONTEXT: hypervisor (d0v6) >> (XEN) [ 3315.431907] rax: 0200200200200200 rbx: 0000000000000006 >> rcx: 0000000000000006 >> (XEN) [ 3315.431914] rdx: 0000003fbfc42580 rsi: ffff82d0802df3a0 >> rdi: ffff83102dba7c78 >> (XEN) [ 3315.431919] rbp: ffff83102dba7d28 rsp: ffff83102dba7bb8 >> r8: 0000000000000001 >> (XEN) [ 3315.431924] r9: 0000000000000001 r10: ffff82d080317528 >> r11: 0000000000000000 >> (XEN) [ 3315.431930] r12: ffff831108d7a000 r13: 0000000000000040 >> r14: ffff83110889e980 >> (XEN) [ 3315.431934] r15: 0000000000000000 cr0: 0000000080050033 >> cr4: 00000000000426e0 >> (XEN) [ 3315.431939] cr3: 000000202036a000 cr2: ffff88013dc783d8 >> (XEN) [ 3315.431944] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: >> e010 cs: e008 >> (XEN) [ 3315.431952] Xen code around <ffff82d08012944f> >> (sched_credit.c#_csched_cpu_pick+0x1af/0x549): >> (XEN) [ 3315.431956] 18 48 8b 00 48 8b 40 28 <48> 8b 40 10 66 81 38 ff >> 7f 75 07 0f ab 9d 50 ff >> (XEN) [ 3315.431973] Xen stack trace from rsp=ffff83102dba7bb8: >> (XEN) [ 3315.431976] 000000012dba7c78 ffff83102db9d9c0 >> ffff82d0802da560 0000000100000002 >> (XEN) [ 3315.431984] ffff8300bdb7e000 ffff82d080121afd >> ffff82d08035ebb0 ffff82d08035eba8 >> (XEN) [ 3315.431992] ff00000000000000 0000000100000028 >> 00000011088c4001 0000000000000001 >> (XEN) [ 3315.431999] ffff83102dba7c38 0000000000000000 >> 0000000000000000 0000000000000000 >> (XEN) [ 3315.432005] 0000000000000000 0000000000000003 >> ffff83102dba7c98 0000000000000206 >> (XEN) [ 3315.432011] 0000000000000292 000000fb2dba7c78 >> 0000000000000206 ffff82d08032ab78 >> (XEN) [ 3315.432018] 00000000fffddfb7 ffff82d080121a1c >> ffff83102dba7ca8 000000002dba7ca8 >> (XEN) [ 3315.432025] ff00000000000000 ffff830000000028 >> ffff83102dba7ce8 ffff82d08013dc34 >> (XEN) [ 3315.432032] 00000000ffffffff 0000000000000010 >> 0000000000000048 0000000000000048 >> (XEN) [ 3315.432038] 0000000000000001 ffff83110889e8c0 >> ffff83102dba7d38 ffff82d08013dff0 >> (XEN) [ 3315.432045] ffff83102dba7d28 ffff8300bdb7e000 >> ffff831108d7a000 0000000000000040 >> (XEN) [ 3315.432053] ffff83110889e980 ffff83110889e8c0 >> ffff83102dba7d38 ffff82d080129804 >> (XEN) [ 3315.432060] ffff83102dba7d78 ffff82d080129833 >> ffff83102dba7d98 ffff8300bdb7e000 >> (XEN) [ 3315.432068] ffff831108d7a000 0000000000000040 >> 0000000000000001 ffff83110889e8c0 >> (XEN) [ 3315.432074] ffff83102dba7db8 ffff82d08012f930 >> 0000000000000006 ffff8300bdb7e000 >> (XEN) [ 3315.432081] ffff831108d7a000 000000000000001b >> 0000000000000006 0000000000000020 >> (XEN) [ 3315.432087] ffff83102dba7de8 ffff82d080107847 >> ffff831108d7a000 0000000000000006 >> (XEN) [ 3315.432095] 00007f9f7007b004 ffff83102db9d9c0 >> ffff83102dba7f08 ffff82d08010537c >> (XEN) [ 3315.432102] ffff8300bd8fd000 07ff830000000000 >> 000000000000001b 000000000000001b >> (XEN) [ 3315.432109] ffff8310031540c0 0000000000000003 >> ffff83102dba7e48 ffff83103ffe37c0 >> (XEN) [ 3315.432116] Xen call trace: >> (XEN) [ 3315.432122] [<ffff82d08012944f>] >> sched_credit.c#_csched_cpu_pick+0x1af/0x549 >> (XEN) [ 3315.432129] [<ffff82d080121afd>] >> page_alloc.c#alloc_heap_pages+0x604/0x6d7 >> (XEN) [ 3315.432135] [<ffff82d080121a1c>] >> page_alloc.c#alloc_heap_pages+0x523/0x6d7 >> (XEN) [ 3315.432141] [<ffff82d08013dc34>] xmem_pool_alloc+0x43f/0x46d >> (XEN) [ 3315.432147] [<ffff82d08013dff0>] _xmalloc+0xcb/0x1fc >> (XEN) [ 3315.432153] [<ffff82d080129804>] >> sched_credit.c#csched_cpu_pick+0x1b/0x1d >> (XEN) [ 3315.432160] [<ffff82d080129833>] >> sched_credit.c#csched_vcpu_insert+0x2d/0x14f >> (XEN) [ 3315.432166] [<ffff82d08012f930>] sched_init_vcpu+0x24e/0x2ec >> (XEN) [ 3315.432173] [<ffff82d080107847>] alloc_vcpu+0x1d1/0x2ca >> (XEN) [ 3315.432178] [<ffff82d08010537c>] do_domctl+0x98f/0x1de3 >> (XEN) [ 3315.432189] [<ffff82d08022ac5b>] lstar_enter+0x9b/0xa0 >> (XEN) [ 3315.432192] >> (XEN) [ 3317.105524] >> (XEN) [ 3317.114726] **************************************** >> (XEN) [ 3317.139954] Panic on CPU 3: >> (XEN) [ 3317.155197] GENERAL PROTECTION FAULT >> (XEN) [ 3317.174247] [error_code=0000] >> (XEN) [ 3317.190248] **************************************** >> (XEN) [ 3317.215469] >> (XEN) [ 3317.224674] Reboot in five seconds... >> (XEN) [ 3317.243913] Executing kexec image on cpu3 >> (XEN) [ 3317.265338] Shot down all CPUs >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |