[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] Race condition with scheduler runqueues
 
- To: Jan Beulich <JBeulich@xxxxxxxx>
 
- From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
 
- Date: Wed, 27 Feb 2013 07:19:56 +0100
 
- Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>,	Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>,	Xen-devel List <xen-devel@xxxxxxxxxxxxx>
 
- Delivery-date: Wed, 27 Feb 2013 06:20:54 +0000
 
- Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns;	h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV:	Received:Received:Message-ID:Date:From:Organization:	User-Agent:MIME-Version:To:CC:Subject:References:	In-Reply-To:Content-Type;	b=cKyy/nyzBcr68IrrQ3CCShcjLkgDKzbZjTU42szR0FNNQ1sY0VhmwG5P	RxbwnIud/Ewhun7YwQfuukfRgfLCHCQwDD8XqQT2l0OZpOJPDKyIbV7ib	sDADobP2qJevDaXVlerUF2yQDtPRoFUHIQnURRVoROG+4N3szy4bsvNL6	cZfHxgzjn31fSnC87Qr3mTX/EcU1XKbFOaISzwk7gF8d4JR9p0JdTMv+E	1wFCvX250iLjPWsJ1bHXWqgJ22yw1;
 
- List-id: Xen developer discussion <xen-devel.lists.xen.org>
 
 
 
On 19.02.2013 10:28, Jan Beulich wrote:
 
On 18.02.13 at 19:11, Andrew Cooper<andrew.cooper3@xxxxxxxxxx>  wrote:
 
 
 
Hello,
Our testing has discovered a crash (pagefault at 0x0000000000000008)
which I have tracked down to bad __runq_remove() in csched_vcpu_sleep()
in sched_credit.c (because a static function of the same name also
exists in sched_credit2.c, which confused matters to start with)
The test case was a loop of localhost migrate of a 1vcpu HVM win8
domain.  The test case itself has passed many times in the past on the
same Xen codebase (Xen-4.1.3), indicating that it is very rare.  There
does not appear to be any relevant changes between the version of Xen in
the test and xen-4.1-testing.
The failure itself is because of a XEN_DOMCTL_scheduler_op (trace below)
from dom0, targeting the VCPU of the migrating domain.
(XEN) Xen call trace:
(XEN)       [<ffff82c480116a14>] csched_vcpu_sleep+0x44/0x70
(XEN)      0[<ffff82c480120117>] vcpu_sleep_nosync+0xe7/0x3b0
(XEN)     12[<ffff82c4801203e9>] vcpu_sleep_sync+0x9/0x50
(XEN)     14[<ffff82c48011fd4c>] sched_adjust+0xac/0x230
(XEN)     24[<ffff82c480102bc1>] do_domctl+0x731/0x1130
(XEN)     64[<ffff82c4802013c4>] compat_hypercall+0x74/0x80
The relevant part of csched_vcpu_sleep() is
     else if ( __vcpu_on_runq(svc) )
         __runq_remove(svc);
which disassembles to
ffff82c480116a01:       49 8b 10                mov    (%r8),%rdx
ffff82c480116a04:       4c 39 c2                cmp    %r8,%rdx
ffff82c480116a07:       75 07                   jne    ffff82c480116a10
<csched_vcpu_sleep+0x40>
ffff82c480116a09:       f3 c3                   repz retq
ffff82c480116a0b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffff82c480116a10:       49 8b 40 08             mov    0x8(%r8),%rax
ffff82c480116a14:       48 89 42 08             mov    %rax,0x8(%rdx) #
<- Pagefault here
ffff82c480116a18:       48 89 10                mov    %rdx,(%rax)
ffff82c480116a1b:       4d 89 40 08             mov    %r8,0x8(%r8)
ffff82c480116a1f:       4d 89 00                mov    %r8,(%r8)
The relevant crash registers from the pagefault are:
rax: 0000000000000000
rdx: 0000000000000000
  r8: ffff83080c89ed90
If I am reading the code correctly, this means that runq->next is NULL,
so we fail list_empty() and erroneously pass __vcpu_on_runq().  We then
fail with a fault when trying to update runq->prev, which is also NULL.
The only place I can spot in the code where the runq->{next,prev} could
conceivably be NULL is in csched_alloc_vdata() between the memset() and
INIT_LIST_HEAD().  This is logically sensible in combination with the
localhost migrate loop, and I cant immediately see anything to prevent
this race happening.
 
But that doesn't make sense: csched_alloc_vdata() doesn't store
svc into vc->sched_priv; that's being done by the generic
scheduler code once the actor returns.
So I'd rather suspect a stale pointer being used, which is easily
possible when racing with sched_move_domain() (as opposed to
schedule_cpu_switch(), where the new pointer gets stored
_before_ de-allocating the old one).
However, sched_move_domain() (as well as schedule_cpu_switch())
get called only from CPU pools code, and I would guess CPU pools
aren't involved here, and you don't in parallel soft offline/online
pCPU-s (as I'm sure you otherwise would have mentioned it).
But wait - libxl__domain_make() _unconditionally_ calls
xc_cpupool_movedomain(), as does XendDomainInfo's
_constructDomain(). The reason for this escapes me - JÃrgen? Yet
I'd expect the pool ID matching check to short cut the resulting
sysctl, i.e. sched_move_domain() ought to not be reached anyway
(worth verifying of course).
The race there nevertheless ought to be fixed.
 
 
Something like the attached patch?
Not tested thoroughly yet.
Juergen
--
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html
 Attachment:
movedom.patch 
Description: Text Data 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
 
 
    
     |