[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] mm, page_alloc: fix build_zonerefs_node()



On 07.04.22 14:32, Mel Gorman wrote:
On Thu, Apr 07, 2022 at 01:17:19PM +0200, Juergen Gross wrote:
On 07.04.22 13:07, Michal Hocko wrote:
On Thu 07-04-22 12:45:41, Juergen Gross wrote:
On 07.04.22 12:34, Michal Hocko wrote:
Ccing Mel

On Thu 07-04-22 11:32:21, Juergen Gross wrote:
Since commit 9d3be21bf9c0 ("mm, page_alloc: simplify zonelist
initialization") only zones with free memory are included in a built
zonelist. This is problematic when e.g. all memory of a zone has been
ballooned out.

What is the actual problem there?

When running as Xen guest new hotplugged memory will not be onlined
automatically, but only on special request. This is done in order to
support adding e.g. the possibility to use another GB of memory, while
adding only a part of that memory initially.

In case adding that memory is populating a new zone, the page allocator
won't be able to use this memory when it is onlined, as the zone wasn't
added to the zonelist, due to managed_zone() returning 0.

How is that memory onlined? Because "regular" onlining (online_pages())
does rebuild zonelists if their zone hasn't been populated before.

The Xen balloon driver has an own callback for onlining pages. The pages
are just added to the ballooned-out page list without handing them to the
allocator. This is done only when the guest is ballooned up.


Is this new behaviour? I ask because keeping !managed_zones out of the

For some time (since kernel 5.9) Xen is using the zone device functionality
with memremap_pages() and pgmap->type = MEMORY_DEVICE_GENERIC.

zonelist and reclaim paths and the behaviour makes sense. Elsewhere you
state "zone can always happen to have no free memory left" and this is true
but it's usually a transient event. The difference between a populated

And if this "transient event" is just happening when the zonelists are
being rebuilt the zone will be off the lists maybe forever.

vs managed zone is usually permanent event where no memory will ever be
placed on the buddy lists because the memory was reserved early in boot
or a similar reason. The patch is probably harmless but it has the
potential to waste CPUs allocating or reclaiming from zones that will
never succeed.

I'd recommend to have an explicit flag per-zone for this case if you
really care about that. This would be much cleaner than to imply from
no free page being present at a specific point in time, that the zone
will never be subject to memory allocation.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.