[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] scrub pages on guest termination



On 23/5/08 18:01, "Ben Guthro" <bguthro@xxxxxxxxxxxxxxx> wrote:

Yes, sorry -  should have removed our terminology from the description.
Node=physical machine
VS=HVM guest w/ pv-on-hvm drivers
Looking back at the original bug report - it seems to indicate it was migrating from a system with 2 processors to one with 8

It’s very surprising that lock contention would cause such a severe lack of progress on an 8-CPU system. If the lock is that hotly contended then even the usage of it in free_domheap_pages() has to be questionable.

I’m inclined to say that if we want to address this then we should do it in one or more of the following ways:
 1. Count CPUs into the scrub function with an atomic_t and beyond a limit all other CPUs bail straight out after re-setting their timer.
 2. Increase scrub batch size to reduce proportion of time that each loop iteration holds the lock.
 3. Turn the spin_lock() into a spin_trylock() so that the timeout check can be guaranteed to execute frequently.
 4. Eliminate the global lock by building a lock-free linked list, or by maintaining per-CPU hashed work queues with work stealing, or... etc.

The patch as-is at least suffers from the issue that the ‘primary scrubber’ should be regularly checking for softirq work. But I’m not sure such a sizeable change to the scheduling policy for scrubbing (such as it is!) is necessary or desirable.

Option 4 is on the morally highest ground but is of course the most work. :-)

 -- Keir
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.