[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split


  • To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Wed, 09 Feb 2011 14:04:08 +0100
  • Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
  • Delivery-date: Wed, 09 Feb 2011 05:04:43 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=iqlqe3TCUSpiC6G8z1QCG9WMln98IN5MB27Abr5C9IaLiLq4dD2ZkLfh UcyuNrwjmBEfihCf1aTGsD9q1w4PX8Sycicn2ADMnXjYaW1rec0Qahq6E jLh8IDNh57FTkMolFY3YknAvGNm2ERUAPUGR31Zj16r0EoKa9Bl9Pr7oZ eTmtSxy08L2Vw8S7w4Jk3BpknoabeTcxMPIM6GIhEUXDQ9cg13JpEbxXs ov0+LtyXy/ubcQRpJZ2ZDeepdnCgF;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 02/09/11 13:27, George Dunlap wrote:
Sorry, forgot the patch...
  -G

On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx>  wrote:
On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara<andre.przywara@xxxxxxx>  wrote:
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29

Interesting -- what seems to happen here is that as cpus are disabled,
vcpus are "shovelled" in an accumulative fashion from one cpu to the
next:
* v18,34,42 start on cpu 24.
* When 24 is brought down, they're all migrated to 25; then when 25 is
brougth down, to 26, then to 27
* v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix
* v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29.

While that behavior may not be ideal, it should certainly be bug-free.

Another interesting thing to note is that the bug happened on pcpu 32,
but there were no advertised migrations from that cpu.

If I understand the configuration of Andre's machine correctly, pcpu32 will
be the target of the next migrations. This pcpu is member of the next numa
node, correct?

Could it be there is a problem with the call of domain_update_node_affinity()
from cpu_disable_scheduler() ?

Hmm, I think this could really be the problem.
Andre, could you try the following patch?

diff -r f1fac30a531b xen/common/schedule.c
--- a/xen/common/schedule.c     Wed Feb 09 08:58:11 2011 +0000
+++ b/xen/common/schedule.c     Wed Feb 09 14:02:12 2011 +0100
@@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c
                         v->domain->domain_id, v->vcpu_id);
                 cpus_setall(v->cpu_affinity);
                 affinity_broken = 1;
+            }
+            if ( cpus_weight(v->cpu_affinity) < NR_CPUS )
+            {
+                cpu_clear(cpu, v->cpu_affinity);
             }

             if ( v->processor == cpu )


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.