[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-4.12-testing test] 169199: regressions - FAIL


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 8 Apr 2022 09:01:11 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=i8Tb+DZM38MOHgaL9OtsGqtYxTFRI9YuIADuuOOfexg=; b=boeT9/4BFWEWyMy8+mVVa1ZZGjymEwRLfhoO/Z4ypai0jKk9ho0JSINk20Zc52595CK67XvoXcR5a78l8cVD1p4j07XLuOgUFBjQIt0uWpx5rDaYc0nTaUaQBb7ZofpQcxvVlj7yu2Af2M7lZop3PEEuKSfm+DzsJ1SzmD3neBFT2kMydAdpyGZ4V6xq0Dx6mDWfxwWNcqgu7EEhUh4XeiJnYu8TkvKz7EB448xp4FG1cgdDtYKy9zuohOEdFKA8pOuVaPw03+YVKmjLvmwvxYXqTteYRsGH3+or9pFp8jnKuBjboChLFT4CF/0rczUmsnT1jR5l1ub8SqVG8ui70g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MbbYKXju7PB/yxDZFy68yaq163WcTCacLvHNZywln16OqrjWmmr26Grfi51nUdxi5awfEXnTLdurluyms4Xr6OKHk4KWRWzCrMtSyYOe2C1DKW7A0vWP/ql+D9qfXarxYZ+63PeKLdS4+5URokBsc7cc93htG3X8O9vOSdwV36apjeLVGLA0ytZAvwq+92xAFuGIJULxfd3Qh7W7dvhgFKLHTmIgZvxhFDU7/AOLhsUy0ND9yEWSa+7/okTeGJ67Na1BaOL6G7kHzOhN+qPmiy0Fx4P1kDAbVR857/lEH93MNdGl4SA9mGZ9QikAXFzCIXedO7cp2mtpDHe51oh6Fg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>
  • Delivery-date: Fri, 08 Apr 2022 07:01:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.04.2022 10:45, osstest service owner wrote:
> flight 169199 xen-4.12-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/169199/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail 
> REGR. vs. 168480

While the subsequent flight passed, I thought I'd still look into
the logs here since the earlier flight had failed too. The state of
the machine when the debug keys were issued is somewhat odd (and
similar to the earlier failure's): 11 of the 56 CPUs try to
acquire (apparently) Dom0's event lock, from evtchn_move_pirqs().
All other CPUs are idle. The test failed because the sole guest
didn't reboot in time. Whether the failure is actually connected to
this apparent lock contention is unclear, though.

One can further see that really all about 70 ECS_PIRQ ports are
bound to vCPU 0 (which makes me wonder about lack of balancing
inside Dom0 itself, but that's unrelated). This means that all
other vCPU-s have nothing at all to do in evtchn_move_pirqs().
Since this moving of pIRQ-s is an optimization (the value of which
has been put under question in the past, iirc), I wonder whether we
shouldn't add a check to the function for the list being empty
prior to actually acquiring the lock. I guess I'll make a patch and
post it as RFC.

And of course in a mostly idle system the other aspect here (again)
is: Why are vCPU-s moved across pCPU-s in the first place? I've
observed (and reported) such seemingly over-aggressive vCPU
migration before, most recently in the context of putting together
'x86: make "dom0_nodes=" work with credit2'. Is there anything that
can be done about this in credit2?

A final, osstest-related question is: Does it make sense to run Dom0
on 56 vCPU-s, one each per pCPU? The bigger a system, the less
useful it looks to me to actually also have a Dom0 as big, when the
purpose of the system is to run guests, not meaningful other
workloads in Dom0. While this is Xen's default (i.e. in the absence
of command line options restricting Dom0), I don't think it's
representing typical use of Xen in the field.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.