[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split


  • To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Tue, 08 Feb 2011 13:23:08 +0100
  • Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
  • Delivery-date: Tue, 08 Feb 2011 04:24:57 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=WIXRNRzgqtULKcgEUYxSi0zyuXKrr34XhE1Ci9SYyQPY5Ao/cM8tA8PI vkeV/O5Wlj+7ARa/QPkWF3I8EMq46tIulmu1PfVcYOCs5elOjFe4vY32n IcVF/NE2ia4phXgVdarujsVZX3+Jpl6Q5UyrURElippdvHmYDbHpPWRpK UY+0eHXf5dhUpchduwGhfp36svfvKVDz17XnJoV65grgAdbgKbUdP7oH2 coMqG0WRLPkNEljRM658ZiZIK/CWj;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 02/08/11 13:08, George Dunlap wrote:
On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx>  wrote:
On 02/07/11 16:55, George Dunlap wrote:

Juergen,

What is supposed to happen if a domain is in cpupool0, and then all of
the cpus are taken out of cpupool0?  Is that possible?

No. Cpupool0 can't be without any cpu, as Dom0 is always member of cpupool0.

If that's the case, then since Andre is running this immediately after
boot, he shouldn't be seeing any vcpus in the new pools; and all of
the dom0 vcpus should be migrated to cpupool0, right?  Is it possible
that migration process isn't happening properly?

Again: not the vcpus are migrated to cpupool0, but the physical cpus are
taken away from it, so the vcpus being active on the cpu to be moved MUST
be migrated to other cpus of cpupool0.


It looks like schedule.c:cpu_disable_scheduler() will try to migrate
all vcpus, and if it fails to migrate, it returns -EAGAIN so that the
tools will try again.  It's probably worth instrumenting that whole
code-path to make sure it actually happens as we expect.  Are we
certain, for example, that if a hypercall continued on another cpu
will actually return the new error value properly?

I have checked that and did never see any problem. And yes, I did see
the EAGAIN case happen.
With my test patch to execute the cpu_disable_scheduler() always on the
cpu to be moved this should not be a problem at all, since the tasklet
is always running in the idle vcpu.


Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why
is the cpu's bit set in cpupool_free_cpus without checking to see if
the cpu_disable_scheduler() call actually worked?  Shouldn't that also
be inside the if() statement?

No, I don't think so. If removing a cpu fails permanently after returning
-EAGAIN before, it should be addable to the original cpupool easily. This can
only be done, if it is flagged as free. Adding it to another cpupool will be
denied as cpupool_cpu_moving is still set.


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.