[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-4.12-testing test] 169199: regressions - FAIL


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 8 Apr 2022 13:01:26 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eZlysfu2xbLtIojeMpQ+WOfCVfjYgxoa3eNp8pGK3G8=; b=OW3NozzjcBzVTq8QsrH5yAMnW+AIks7Zk76ZXa82VZAhN1KaQ0VgQFZnY4SOqFF7iyENs1hwDbA4qlbZE1B0h42Qq89W4tLH49zUskNLRvjKGIWNzLjDEOkO5xtYp/WbAPLHz267Kc+4zJsHPt9vELCY42472Kc3k4SoJ+XAPbbtG3CpsjNOHd6WUeOgNzD99+6VcGLhRqQcVojVTDBjWyKZm8H5EbyCHqxfwRI1PBPzSR9V26kgiMJYEPQEjpw2obmYvD+dSWyxtZ0rZLn2lhbxuLDn+E3FwqYmH7IHYP7aY8cepB1Vj2TTs5PnPdlVs5NaaeHFcZHIwNlKfPU5aQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RsYTaIymx8ah+IHj8c+rnUN+R5fWesgpaSDL3x/CTl/v5+dEqh7LnfMYNb8blk5e2t6I83m2C45KSSE+EVUJYmWDW3R1hnVMGvII4VqPr4rThhexQ24Qx3c3lWitPVd1qfLdmEJsGXpGPCmxNLIiOJXcr64PnYTLetHR/vGtqThQse2fKHRsaXUkh+ERwIDvkFCsDPtj0wH6C3YnLoCka6CipYvzPwNqSBT0WeAA0yWPeQxOH/ndg+xoIyg+Cm+DGtmeS1W1LW5MVdSmu8pzpaceuktujoAJ5jRJh7vOp3QPEtf/qnnv8q5QOrl2bgzzac8F/y6Qhnh+qfPXzkNHnw==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "Dario Faggioli" <dfaggioli@xxxxxxxx>
  • Delivery-date: Fri, 08 Apr 2022 11:01:49 +0000
  • Ironport-data: A9a23:In2YEqP+etvINorvrR1Ll8FynXyQoLVcMsEvi/4bfWQNrUojgmNUz mQWD2+Pb/eLY2b3cogiaIqw/EsGvsOAz4JgGwto+SlhQUwRpJueD7x1DKtR0wB+jCHnZBg6h ynLQoCYdKjYdleF+lH1dOKJQUBUjclkfJKlYAL/En03FFcMpBsJ00o5wbZl2t4w2LBVPivW0 T/Mi5yHULOa82Yc3lI8s8pvfzs24ZweEBtB1rAPTagjUG32zhH5P7pGTU2FFFPqQ5E8IwKPb 72rIIdVXI/u10xF5tuNyt4Xe6CRK1LYFVDmZnF+A8BOjvXez8CbP2lS2Pc0MC9qZzu1c99Zm ed1kqO1UEQQHaTpwMIaaUVRQn5UBPgTkFPHCSDXXc27ykTHdz3nwul0DVFwNoodkgp1KTgQr 7pCcmlLN03dwbLtqF64YrAEasALNs7kMZlZonh95TrYEewnUdbIRKCiCdpwgmdq2p8VQ6i2i 8wxdD9TXj/NfEFzBnQ2C9VmwKSS3XLSbGgNwL6SjfVuuDWCpOBr65D2K8bccNGOQcRTn26bq 3jA8mC/BQsVXPSAzRKV/3TqgfXA9Qv3VosdG7y/8v9Cm0CIyyoYDxh+fVmmpfi0jGauVtQZL FYbkgIsp6Uv8E2gTvHmQga15nWDu3Y0S9dWVuE39gyJ4q7V+BqCQHgJSCZbb94rv9NwQiYlv neOhMj1CCdz9bSZTHOb3qeZqyuoPioYJnNEYjULJTbp+PG6/tt11EiWCI8+Tujl1bUZBA0c3 RjbnhAZuLAKzvco3oDj21f93zv9+bLwG1tdChrsYkqp6QZwZYiAboOu6ETG4fsoELt1XmVtr 1BfxZHAsblm4YWl0XXUHb5TRO3BC+OtamW0vLJ5I3U2G91BEVaHdJsY3jxxLVwB3i0sKW6wO x+7Ve+8CfZu0JqWgU1fPtrZ5ycCl/GI+THZuhb8NIcmjn9ZLlLvwc2WTRTMt10BaWB1+U3FB b+VcNy3EVERArl9wTy9So81iOF3l3pumD6DGs+ilnxLNIZyglbPFN/p13PUMIgEAF6s+l2Jo 76zyePUo/mgbAEOSnaOqtNCRbz7BXM6GYr3u6Rqmh2reWJb9JUaI6aJm9sJItU994wMz7ug1 iztCydwlQuk7VWaeFriV5yWQO62NXqJhSlgZnJE0JfB8yVLXLtDG49DL8BnIuR9rLULIDwdZ 6BtRvhsy89nE1zv0z8ccYP8vMplchGqjhiJJC2rfH40eJsIeuAD0oaMktfHnMXWMheKiA==
  • Ironport-hdrordr: A9a23:80lz/KrCt3RPwlEknpdlO58aV5vPL9V00zEX/kB9WHVpm5Oj+f xGzc516farslossREb+expOMG7MBXhHLpOkPQs1NCZLXXbUQqTXftfBO7ZogEIdBeOk9K1uZ 0QF5SWTeeAcmSS7vyKkDVQcexQuOVvmZrA7Yy1ogYPPGNXguNbnnxE426gYzxLrWJ9dOME/f Snl616T23KQwVoUi33PAhPY8Hz4/nw0L72ax8PABAqrCGIkDOT8bb/VzyVxA0XXT9jyaortT GtqX212oyT99WAjjPM3W7a6Jpb3PPn19t4HcSJzuwYMC/lhAqEbJloH5eCoDc2iuey70tCqq iHnz4Qe+BIr1/BdGC8phXgnyHmzTYV8nfnjWSVhHPyyPaJMg4SOo5kv8Z0YxHZ400vsJVXy6 RQxV+UsJJREFfpgDn9z8KgbWAlqmOE5V4Z1cIDhX1WVoUTLJVLq5YEwU9TGJAcWArn9YEcFv V0Bs203ocYTbqjVQGYgoBT+q3uYpxqdS32AHTq+/blnwS+pUoJjnfxn6ck7zI9HJFUcegy2w 2LCNUtqFh0dL5lUUtMPpZzfSKJMB25ffvtChPaHb21LtBOB5ryw6SHlIndotvaP6A18A==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Apr 08, 2022 at 11:25:28AM +0200, Jan Beulich wrote:
> On 08.04.2022 10:09, Roger Pau Monné wrote:
> > On Fri, Apr 08, 2022 at 09:01:11AM +0200, Jan Beulich wrote:
> >> On 07.04.2022 10:45, osstest service owner wrote:
> >>> flight 169199 xen-4.12-testing real [real]
> >>> http://logs.test-lab.xenproject.org/osstest/logs/169199/
> >>>
> >>> Regressions :-(
> >>>
> >>> Tests which did not succeed and are blocking,
> >>> including tests which could not be run:
> >>>  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail 
> >>> REGR. vs. 168480
> >>
> >> While the subsequent flight passed, I thought I'd still look into
> >> the logs here since the earlier flight had failed too. The state of
> >> the machine when the debug keys were issued is somewhat odd (and
> >> similar to the earlier failure's): 11 of the 56 CPUs try to
> >> acquire (apparently) Dom0's event lock, from evtchn_move_pirqs().
> >> All other CPUs are idle. The test failed because the sole guest
> >> didn't reboot in time. Whether the failure is actually connected to
> >> this apparent lock contention is unclear, though.
> >>
> >> One can further see that really all about 70 ECS_PIRQ ports are
> >> bound to vCPU 0 (which makes me wonder about lack of balancing
> >> inside Dom0 itself, but that's unrelated). This means that all
> >> other vCPU-s have nothing at all to do in evtchn_move_pirqs().
> >> Since this moving of pIRQ-s is an optimization (the value of which
> >> has been put under question in the past, iirc), I wonder whether we
> >> shouldn't add a check to the function for the list being empty
> >> prior to actually acquiring the lock. I guess I'll make a patch and
> >> post it as RFC.
> > 
> > Seems good to me.
> > 
> > I think a better model would be to migrate the PIRQs when fired, or
> > even better when EOI is performed?  So that Xen doesn't pointlessly
> > migrate PIRQs for vCPUs that aren't running.
> 
> Well, what the function does is mark the IRQ for migration only
> (IRQ_MOVE_PENDING on x86). IRQs will only ever be migrated in the
> process of finishing the handling of an actual instance of the
> IRQ, as otherwise it's not safe / race-free.

Oh, OK, so then it doesn't seem to be that different from what I had
in mind.

> >> And of course in a mostly idle system the other aspect here (again)
> >> is: Why are vCPU-s moved across pCPU-s in the first place? I've
> >> observed (and reported) such seemingly over-aggressive vCPU
> >> migration before, most recently in the context of putting together
> >> 'x86: make "dom0_nodes=" work with credit2'. Is there anything that
> >> can be done about this in credit2?
> >>
> >> A final, osstest-related question is: Does it make sense to run Dom0
> >> on 56 vCPU-s, one each per pCPU? The bigger a system, the less
> >> useful it looks to me to actually also have a Dom0 as big, when the
> >> purpose of the system is to run guests, not meaningful other
> >> workloads in Dom0. While this is Xen's default (i.e. in the absence
> >> of command line options restricting Dom0), I don't think it's
> >> representing typical use of Xen in the field.
> > 
> > I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
> > uses 16 for example.
> 
> I'm afraid a fixed number won't do, the more that iirc there are
> systems with just a few cores in the pool (and you don't want to
> over-commit by default).

But this won't over commit, it would just assign dom0 16 vCPUs at
most, if the system has less than 16 vCPUs that's what would be
assigned to dom0.

> While for extreme cases it may not suffice,
> I would like to suggest to consider using ceil(sqrt(nr_cpus)). But
> of course this requires that osstest has a priori knowledge of how
> many (usable) CPUs each system (pair) has, to be able to form such
> a system-dependent command line option.

Well, we could get this number when installing Xen, because at that
point the system is started and running plain Linux (so can see the
real topology). No need for osstest to have a priori knowledge.

> > Albeit not having such parameter has likely led you into figuring out
> > this issue, so it might not be so bad.  I agree however it's likely
> > better to test scenarios closer to real world usage.
> 
> True. One might conclude that we need both then. But of course that
> would make each flight yet more resource hungry.

Yes, let's focus on real-world uses first.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.