[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable test] 164996: regressions - FAIL


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 16 Sep 2021 18:21:32 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=wY7TnsrZbWO1VPz9llXFhPuXTujUHRVNln7a9qarJVQ=; b=muY+lrfhERL5+/KRuAxT++KhOHUp2T/zTPKvkwPgTspu//IwdMhqgiRgV/ACBd5ECyrOu2dG99M8TMU/9HDbOopS64Mn0YgtGTTVq7EuS31z7ND2JiY8mqC26bgEoouv2h0B0R8Ag9DPjqdEhNgaVniTUMnzyVVZC33neufN16ufDNwdBJhrLh4jzKBc+0JzF/7MiBPDAoAzi3yO+FzpNM1OQdmD4PvDv75DFdoTPkFyKO68QFN3Z3x60GRSrx3WzEQy4Xnl+58J00fzhTZYx1S/Y4SCr0jItLC2H10z2gUwXbUud2wVUK18B+aXLavcZSiv//1gYgMiUu9N0pXMQA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jXLxaUrZc04u09im7vYN3iEtnKB8q580ZyRMOJdQnlf9Z34W+VsRONJ/z6VMY1Ow+776uJqXRfrXyvOXt7DUq1Kho3fXAjD2219kctPp9jziB6Yw8mU4Fu6L9K1dZQqp1rAwY+Nb8PnudE3pA7/PMnS9T8hUbrfOPne88MMXHCq7zodZbjq7ML0D21mmCSG/aMaaMhh5Dd/rKhMepSL6sCnzABjwxj6IzLVpDyC5TQ47Bfr6j5xHbMEiTAxRlNJHGRge9ez3lzw7rCmDknYsXNJZOY1FIMIk+7FZK/qwscfquBKDhsX2FD5foHdIZa1CFOIO4xGg16PwZ1cFPUApBQ==
  • Authentication-results: xenproject.org; dkim=none (message not signed) header.d=none;xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>
  • Delivery-date: Thu, 16 Sep 2021 16:21:42 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.09.2021 06:06, osstest service owner wrote:
> flight 164996 xen-unstable real [real]
> flight 165002 xen-unstable real-retest [real]
> http://logs.test-lab.xenproject.org/osstest/logs/164996/
> http://logs.test-lab.xenproject.org/osstest/logs/165002/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeat fail REGR. vs. 
> 164945

Since no-one gave a sign so far of looking into this failure, I took
a look despite having little hope to actually figure something. I'm
pretty sure the randomness of the "when" of this failure correlates
with

Sep 15 14:44:48.518439 [ 1613.227909] rpc-worker: page allocation failure: 
order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), 
nodemask=(null),cpuset=/,mems_allowed=0
Sep 15 14:44:55.418534 [ 1613.240888] CPU: 48 PID: 2029 Comm: rpc-worker Not 
tainted 5.4.17+ #1
Sep 15 14:44:55.430511 [ 1613.247370] Hardware name: Cavium ThunderX CN88XX 
board (DT)
Sep 15 14:44:55.430576 [ 1613.253099] Call trace:
Sep 15 14:44:55.442497 [ 1613.255620]  dump_backtrace+0x0/0x140
Sep 15 14:44:55.442558 [ 1613.259348]  show_stack+0x14/0x20
Sep 15 14:44:55.442606 [ 1613.262734]  dump_stack+0xbc/0x100
Sep 15 14:44:55.442651 [ 1613.266206]  warn_alloc+0xf8/0x160
Sep 15 14:44:55.454512 [ 1613.269677]  __alloc_pages_slowpath+0x9c4/0x9f0
Sep 15 14:44:55.454574 [ 1613.274277]  __alloc_pages_nodemask+0x1cc/0x248
Sep 15 14:44:55.466498 [ 1613.278878]  kmalloc_order+0x24/0xa8
Sep 15 14:44:55.466559 [ 1613.282523]  __kmalloc+0x244/0x270
Sep 15 14:44:55.466607 [ 1613.285995]  alloc_empty_pages.isra.17+0x34/0xb0
Sep 15 14:44:55.478495 [ 1613.290681]  
privcmd_ioctl_mmap_batch.isra.20+0x414/0x428
Sep 15 14:44:55.478560 [ 1613.296149]  privcmd_ioctl+0xbc/0xb7c
Sep 15 14:44:55.478608 [ 1613.299883]  do_vfs_ioctl+0xb8/0xae0
Sep 15 14:44:55.490475 [ 1613.303527]  ksys_ioctl+0x78/0xa8
Sep 15 14:44:55.490536 [ 1613.306911]  __arm64_sys_ioctl+0x1c/0x28
Sep 15 14:44:55.490584 [ 1613.310906]  el0_svc_common.constprop.2+0x88/0x150
Sep 15 14:44:55.502489 [ 1613.315765]  el0_svc_handler+0x20/0x80
Sep 15 14:44:55.502551 [ 1613.319583]  el0_svc+0x8/0xc

As per

Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 
isolated_anon:0
Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 
isolated_file:0
Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 
unstable:0
Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 
slab_unreclaimable:30234
Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 
bounce:0
Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650

the system doesn't look to really be out of memory; as per

Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 
497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 
0*1024kB 0*2048kB 0*4096kB = 33456kB

there even look to be a number of higher order pages available (albeit
without digging I can't tell what "(C)" means). Nevertheless order-4
allocations aren't really nice.

What I can't see is why this may have started triggering recently. Was
the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
small for a Dom0 on this system (with 96 CPUs)? Going through the log
I haven't been able to find crucial information like how much memory
the host has or what the hypervisor command line was.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.