WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL

To: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL
From: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Date: Mon, 14 Mar 2011 10:02:23 +0000
Cc: Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, George
Delivery-date: Mon, 14 Mar 2011 03:04:09 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <19834.24888.630582.491364@xxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <osstest-6374-mainreport@xxxxxxx> <19834.24888.630582.491364@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
At 17:51 +0000 on 11 Mar (1299865912), Ian Jackson wrote:
> Mar 11 13:46:58.154777 (XEN) Xen call trace:
> Mar 11 13:46:58.154798 (XEN)    [<ffff82c480100140>] __bitmap_empty+0x0/0x7f
> Mar 11 13:46:58.163767 (XEN)    [<ffff82c480119582>] csched_cpu_pick+0xe/0x10
> Mar 11 13:46:58.163802 (XEN)    [<ffff82c480122c8d>] vcpu_migrate+0xfb/0x230
> Mar 11 13:46:58.178768 (XEN)    [<ffff82c480122e24>] context_saved+0x62/0x7b
> Mar 11 13:46:58.178799 (XEN)    [<ffff82c480157f17>] 
> context_switch+0xd98/0xdca
> Mar 11 13:46:58.183766 (XEN)    [<ffff82c4801226b4>] schedule+0x5fc/0x624
> Mar 11 13:46:58.183795 (XEN)    [<ffff82c480123837>] __do_softirq+0x88/0x99
> Mar 11 13:46:58.198784 (XEN)    [<ffff82c4801238b2>] do_softirq+0x6a/0x7a

I think this hang comes because although this code:

            cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
            if ( commit )
               CSCHED_PCPU(nxt)->idle_bias = cpu;
            cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));

removes the new cpu and its siblings from cpus, cpu isn't guaranteed to
have been in cpus in the first place, and none of its siblings are
either since nxt might not be its sibling.

Possible fix:

diff -r b9a5d116102d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Mar 10 13:06:52 2011 +0000
+++ b/xen/common/sched_credit.c Mon Mar 14 09:25:07 2011 +0000
@@ -533,7 +533,7 @@ _csched_cpu_pick(const struct scheduler 
             cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
             if ( commit )
                CSCHED_PCPU(nxt)->idle_bias = cpu;
-            cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));
+            cpus_andnot(cpus, cpus, nxt_idlers);
         }
         else
         {

which guarantees that nxt will be removed from cpus, though I suspect
this means that we might not pick the best HT pair in a particular core.
Scheduler code is twisty and hurts my brain so I'd like George's
opinion before checking anything in.

Cheers,

Tim.

P.S. the patch above is a one-liner for clarity: a better fix would be:

diff -r b9a5d116102d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Mar 10 13:06:52 2011 +0000
+++ b/xen/common/sched_credit.c Mon Mar 14 09:26:11 2011 +0000
@@ -533,12 +533,8 @@ _csched_cpu_pick(const struct scheduler 
             cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
             if ( commit )
                CSCHED_PCPU(nxt)->idle_bias = cpu;
-            cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));
         }
-        else
-        {
-            cpus_andnot(cpus, cpus, nxt_idlers);
-        }
+        cpus_andnot(cpus, cpus, nxt_idlers);
     }
 
     return cpu;



-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel