Xen project Mailing List

Re: [xen-unstable test] 164996: regressions - FAIL

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>

Date: Wed, 22 Sep 2021 09:34:46 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=g+DxErIK80ExGUZHlpS0lXGQGa4hZ6FYuJR0+GOnBaA=; b=PUJjpCM7DBkji0EDnhTvdwaXcNqSuNbXLoGExdSof6DAQssIGr/dJlnneP3K4swRQnAT87Iferzh4+s2EOjKSYh+zNuFAQ8WYuBwGduQ+TSwr59z1vSjFZB3lNmC3WiO0VmVfOU5JD3u7oVbflu0PGUpJGxItg7MTETR63Og1psxXc3rFkg6olJYRBJDlkAccqqf8Vey4WMb/F4jjybEcvC8Oewc1LaywuGKDvCqaOnp1aIgiBf2b4F0ltAGxDelgv4VOAwf6YXOnc6rv2FmsT/rEo1xhUjQPbgwsnW/xVSkzHVjWr28CWiJjp1zpSy8plJSPvDrUtJiSLuGaMSIIQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LuePZIo2UQDxDvCpD9HML9nCC71wh6PVicHAvNPM1oAR5prA/+5OUDAR9tAz1G8oj0YnlgNJefGApN8EVW5J1Kmzzql/kE6Db8wVm8DRzWA956tJxl7ayYm8p8lt3RwusfTQbezSiqJU+tPAxcX4l00FsiBFzJ4VV1Tcwql6poORmBNbau8NSDa8zVB1jpjL+impxIQXbxgaw18awzqvm3YmMWmvt4Hd9ky0ntJMbMDmMXJO7cmAv6cwCo3QNfTdpVwcne5z62LKxwfXfshoPbCRd7Y5oMLZGsJHpjeSiMhbfu1o3/LH050nWJI6KMxyamT2t8isBRcyOLhoxaAe1w==

Authentication-results: apertussolutions.com; dkim=none (message not signed) header.d=none;apertussolutions.com; dmarc=none action=none header.from=suse.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, dpsmith@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 22 Sep 2021 07:35:08 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.09.2021 01:38, Stefano Stabellini wrote: > On Mon, 20 Sep 2021, Ian Jackson wrote: >> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"): >>> As per >>> >>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info: >>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 >>> isolated_anon:0 >>> Sep 15 14:44:55.514480 [ 1613.324918] active_file:13286 >>> inactive_file:11182 isolated_file:0 >>> Sep 15 14:44:55.514545 [ 1613.324918] unevictable:0 dirty:30 writeback:0 >>> unstable:0 >>> Sep 15 14:44:55.526477 [ 1613.324918] slab_reclaimable:10922 >>> slab_unreclaimable:30234 >>> Sep 15 14:44:55.526540 [ 1613.324918] mapped:11277 shmem:10975 >>> pagetables:401 bounce:0 >>> Sep 15 14:44:55.538474 [ 1613.324918] free:8364 free_pcp:100 free_cma:1650 >>> >>> the system doesn't look to really be out of memory; as per >>> >>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) >>> 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB >>> (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB >>> >>> there even look to be a number of higher order pages available (albeit >>> without digging I can't tell what "(C)" means). Nevertheless order-4 >>> allocations aren't really nice. >> >> The host history suggests this may possibly be related to a qemu update. >> >> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html Stefano - as per some of your investigation detailed further down I wonder whether you had seen this part of Ian's reply. (Question of course then is how that qemu update had managed to get pushed.) >> The grub cfg has this: >> >> multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all >> console=dtuart dom0_mem=512M,max:512M ucode=scan ${xen_rm_opts} >> >> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off". > > I definitely recommend to increase dom0 memory, especially as I guess > the box is going to have a significant amount, far more than 4GB. I > would set it to 2GB. Also the syntax on ARM is simpler, so it should be > just: dom0_mem=2G Ian - I guess that's an adjustment relatively easy to make? I wonder though whether we wouldn't want to address the underlying issue first. Presumably not, because the fix would likely take quite some time to propagate suitably. Yet if not, we will want to have some way of verifying that an eventual fix there would have helped here. > In addition, I also did some investigation just in case there is > actually a bug in the code and it is not a simple OOM problem. I think the actual issue is quite clear; what I'm struggling with is why we weren't hit by it earlier. As imo always, non-order-0 allocations (perhaps excluding the bringing up of the kernel or whichever entity) are to be avoided it at possible. The offender in this case looks to be privcmd's alloc_empty_pages(). For it to request through kcalloc() what ends up being an order-4 allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty large chunk of guest memory to get mapped. Which may in turn be questionable, but I'm afraid I don't have the time to try to drill down where that request is coming from and whether that also wouldn't better be split up. The solution looks simple enough - convert from kcalloc() to kvcalloc(). I can certainly spin up a patch to Linux to this effect. Yet that still won't answer the question of why this issue has popped up all of the sudden (and hence whether there are things wanting changing elsewhere as well). Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.