Re: [Xen-devel] [PATCH] scrub pages on guest termination

To:	Ben Guthro <bguthro@xxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH] scrub pages on guest termination
From:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date:	Fri, 23 May 2008 18:19:25 +0100
Cc:	xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Robert Phillips <rphillips@xxxxxxxxxxxxxxx>
Delivery-date:	Fri, 23 May 2008 10:19:44 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<4836F85C.1010609@xxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Aci8+SX4ZE8sZyjsEd284wAWy6hiGQ==
Thread-topic:	[Xen-devel] [PATCH] scrub pages on guest termination
User-agent:	Microsoft-Entourage/11.4.0.080122

On 23/5/08 18:01, "Ben Guthro" <bguthro@xxxxxxxxxxxxxxx> wrote:

Yes, sorry - should have removed our terminology from the description.
Node=physical machine
VS=HVM guest w/ pv-on-hvm drivers
Looking back at the original bug report - it seems to indicate it was migrating from a system with 2 processors to one with 8

It’s very surprising that lock contention would cause such a severe lack of progress on an 8-CPU system. If the lock is that hotly contended then even the usage of it in free_domheap_pages() has to be questionable.

I’m inclined to say that if we want to address this then we should do it in one or more of the following ways:
1. Count CPUs into the scrub function with an atomic_t and beyond a limit all other CPUs bail straight out after re-setting their timer.
2. Increase scrub batch size to reduce proportion of time that each loop iteration holds the lock.
3. Turn the spin_lock() into a spin_trylock() so that the timeout check can be guaranteed to execute frequently.
4. Eliminate the global lock by building a lock-free linked list, or by maintaining per-CPU hashed work queues with work stealing, or... etc.

The patch as-is at least suffers from the issue that the ‘primary scrubber’ should be regularly checking for softirq work. But I’m not sure such a sizeable change to the scheduling policy for scrubbing (such as it is!) is necessary or desirable.

Option 4 is on the morally highest ground but is of course the most work. :-)

-- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH] scrub pages on guest termination