Xen project Mailing List

Re: [Xen-devel] [PATCH RESEND RFC 0/8] Memory scrubbing from idle loop

On 27/02/17 17:06, Boris Ostrovsky wrote: >>> This series adds support for scrubbing released pages from idle loop, >>> making guest destruction significantly faster. For example, destroying a >>> 1TB guest can now be completed in slightly over 1 minute as opposed to >>> about 9 minutes using existing scrubbing algorithm. >> What is this 1m now? I'd have thought that freeing memory would become >> an O(1) splice from the domain pagelist onto the to-be-scrubbed list, >> although I see that isn't quite how you have implemented it. > I believe we are spending lots of time in relinquish_memory() due to > hypercall restarts. And in general relinquish_memory() is linear in > terms of number of pages. Hmm. There is also no trivial way to make it better than O(n). Oh well. For now, a 10x improvement is definitely better than nothing. > > >> As a couple of other thoughts, how about getting rid of the boot time >> explicit scrub by initialising the entire heap as needing scrubbing? >> That way, scrubbing will start happening as soon as the APs hit their >> idle loops, rather than later in one explicit action once dom0 is >> constructed. > I can try this. > >> How about marking pages as needing scrubbing at the end of alloc(), >> rather than in free() ? That way you don't need to walk the pagelists >> just to set the pending_scrub flag. > Do we know that all allocations need a scrubbing at the release time? We > have 'scrub = !!d->is_dying' so it's possible that not all freed pages > need to be scrubbed. The !!d->is_dying is an optimisation for ballooning, where we trust the balloon driver not to hand back sensitive data. IMO this is poor behaviour, especially as it doesn't mitigate the damage of a buggy balloon driver which frees up the wrong RAM out of a guest. In reality, all pages freed should be assumed dirty. Everything allocation going guest-wards definitely should be zeroed. Allocations for Xen could come straight off the dirty list if it is known ahead of time that the code will fully initialise the page itself. Otherwise, it would be better to have the alloc() subsystem to guarantee to hand out zeroed pages (unless dirty is explicitly requested), at which point we could drop some of the code doing explicit re-zeroing because we don't know for certain that the allocated page is actually clear. (Thinking about it, it might eventually be worth having a CONFIG_DEBUG_ZEROED_ALLOC option which checks allocations and panics if we don't hand a zeroed page back.) > > I was also thinking that really we need to only mark the head as needing > a scrub so perhaps we don't need to walk the whole list setting the > dirty bit. > >>> The downside of this series is that we sometimes fail to allocate high-order >>> sets of pages since dirty pages may not yet be merged into higher-order sets >>> while they are waiting to be scrubbed. >> I presume there is no sensible way to try and prioritise the pages which >> would have greatest effect for recreating higher-order sets? > I am currently scrubbing as 'for ( order = MAX_ORDER; order >= 0; > order-- )' but I wonder whether I should do this in ascending order so > that dirty chunks are merged up sooner. I am not familiar with the details, so will defer to your judgement. >>> Briefly, the new algorithm places dirty pages at the end of heap's page list >>> for each node/zone/order to avoid having to scan full list while searching >>> for dirty pages. One processor form each node checks whether the node has >>> any >>> dirty pages and, if such pages are found, scrubs them. Scrubbing itself >>> happens without holding heap lock so other users may access heap in the >>> meantime. If while idle loop is scrubbing a particular chunk of pages this >>> chunk is requested by the heap allocator, scrubbing is immediately stopped. >> Why not maintain two lists? That way it is O(1) to find either a clean >> or dirty free page. > Since dirty pages are always at the tail of page lists we are not really > searching the lists. As soon as a clean page is found (starting from the > tail) we can stop. On this point, ok. However, multiple lists might make the fine grain locking easer. (I guess it will be better not to needlessly optimise this case before the more obvious changes are made.) >>> On the allocation side, alloc_heap_pages() first tries to satisfy allocation >>> request using only clean pages. If this is not possible, the search is >>> repeated and dirty pages are scrubbed by the allocator. >>> >>> This series needs to be fixed up for ARM and may have a few optimizations >>> added (and possibly some cleanup). I also need to decide what to do about >>> nodes without processors. >>> >>> I originally intended to add per-node heap locks >> Is this why scrubbing is limited to a single processor per node? What >> kind of performance impact does this give? > My reasoning was that one processor would already make the memory > controller busy enough so that adding another core banging on it > wouldn't make things any better. But I haven't made any comparisons with > multiple cores doing this. I doubt that one single core can saturate the memory controller alone. The current bootscrub logic uses every core (excluding sibling threads), but doing that in this situation is not practical until the heap lock is split up. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.