Xen project Mailing List

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Stephan Diestelhorst <stephan.diestelhorst@xxxxxxx>

From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>

Date: Thu, 03 Feb 2011 06:57:11 +0100

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "Przywara, Andre" <Andre.Przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Wed, 02 Feb 2011 21:58:38 -0800

Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=ckctJYYyvc2uRCuHZFbKlScDd5vM4yP9IcHnLwmE5W1we4FKQ71Js74C MTHYDereyFSdhRYMDV1ovp3xhZoplpksICaQc9UHy62v58pyva7ymDwrg cWiiEVUncSw8497TeOESiPlDqTwsCiwsixwnVpCHwCQvsUcftkcbxTeWq 6gmoVaLRAYJjIZ5HxQ4SQQ//Tm4u5zjIcWPe3UjiF/wrStjQ9rA3N6y8F Xahki+jqFC0drSf6uKx0ZOHnVSkjm;

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 02/02/11 17:01, Stephan Diestelhorst wrote:

On Wednesday 02 February 2011 16:14:25 Juergen Gross wrote:

On 02/02/11 15:39, Stephan Diestelhorst wrote:

We have the following theory of what happens:
* some vcpus of a particular domain are currently in the process of
    being moved to the new pool


The only _vcpus_ to be moved between pools are the idle vcpus. And those
never contribute to accounting in credit scheduler.

We are moving _pcpus_ only (well, moving a domain between pools actually
moves vcpus as well, but then the domain is paused).


How do you ensure that the domain is paused and stays that way? Pausing
the domain was what I had in mind, too...

Look at sched_move_domain() in schedule.c: I'm calling domain_pause() before moving the vcpus and domain_unpause() after that.

Despite the rant, it is amazing to see the ability to move running
things around through this remote continuation trick! In my (ancient)
balancer experiments I added hypervisor-threads just for side-
stepping this issue..


I think the easiest way to solve the problem would be to move the cpu to the
new pool in a tasklet. This is possible now, because tasklets are always
executed in the idle vcpus.


Yep. That was exactly what I build. At the time stuff like that did
not exist (2005).

OTOH I'd like to understand what is wrong with my current approach...


Nothing, in fact I like it. In my rant I complained about the fact
that splitting the critical section accross this continuation looks
scary, basically causing some generic red lights to turn on :-) And
making reasoning about the correctness a little complicated, but that
may well be a local issue ;-)

Perhaps you can help solving the miracle: Could you replace the BUG_ON in sched_credit.c:389 with something like this: if (!is_idle_vcpu(per_cpu(schedule_data, cpu).curr)) { extern void dump_runq(unsigned char key); struct vcpu *vc = per_cpu(schedule_data, cpu).curr; printk("+++ (%d.%d) instead idle vcpu on cpu %d\n", vc->domain->domain_id, vc->vcpu_id, cpu); dump_runq('q'); BUG(); } Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.