[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/1] mm: fix folio_pte_batch() on XEN PV



On Sun, 4 May 2025 08:47:45 +0200 David Hildenbrand <david@xxxxxxxxxx> wrote:

> > 
> > Methinks max_nr really wants to be unsigned long. 
> 
> We only batch within a single PTE table, so an integer was sufficient.
> 
> The unsigned value is the result of a discussion with Ryan regarding 
> similar/related
> (rmap) functions:
> 
> "
> Personally I'd go with signed int (since
> that's what all the counters in struct folio that we are manipulating are,
> underneath the atomic_t) then check that nr_pages > 0 in
> __folio_rmap_sanity_checks().
> "
> 
> https://lore.kernel.org/linux-mm/20231204142146.91437-14-david@xxxxxxxxxx/T/#ma0bfff0102f0f2391dfa94aa22a8b7219b92c957
> 
> As soon as we let "max_nr" be an "unsigned long", also the return value
> should be an "unsigned long", and everybody calling that function.
> 
> In this case here, we should likely just use whatever type "max_nr" is.
> 
> Not sure myself if we should change that here to unsigned long or long. Some
> callers also operate with the negative values IIRC (e.g., adjust the RSS by 
> doing -= nr).

"rss -= nr" doesn't require, expect or anticipate that `nr' can be negative!

> 
> > That will permit the
> > cleanup of quite a bit of truncation, extension, signedness conversion
> > and general type chaos in folio_pte_batch()'s various callers.
> > > And...
> > 
> > Why does folio_nr_pages() return a signed quantity?  It's a count.
> 
> A partial answer is in 1ea5212aed068 ("mm: factor out large folio handling
> from folio_nr_pages() into folio_large_nr_pages()"), where I stumbled over the
> reason for a signed value myself and at least made the other
> functions be consistent with folio_nr_pages():
> 
> "
>      While at it, let's consistently return a "long" value from all these
>      similar functions.  Note that we cannot use "unsigned int" (even though
>      _folio_nr_pages is of that type), because it would break some callers 
> that
>      do stuff like "-folio_nr_pages()".  Both "int" or "unsigned long" would
>      work as well.
> 
> "
> 
> Note that folio_nr_pages() returned a "long" since the very beginning. 
> Probably using
> a signed value for consistency because also mapcounts / refcounts are all 
> signed.

Geeze.

Can we step back and look at what we're doing?  Anything which counts
something (eg, has "nr" in the identifier) cannot be negative.

It's that damn "int" thing.  I think it was always a mistake that the C
language's go-to type is a signed one.  It's a system programming
language and system software rarely deals with negative scalars. 
Signed scalars are the rare case.

I do expect that the code in and around here would be cleaner and more
reliable if we were to do a careful expunging of inappropriately signed
variables.

> 
> > 
> > And why the heck is folio_pte_batch() inlined?  It's larger then my
> > first hard disk and it has five callsites!
> 
> :)
> 
> In case of fork/zap we really want it inlined because
> 
> (1) We want to optimize out all of the unnecessary checks we added for other 
> users
> 
> (2) Zap/fork code is very sensitive to function call overhead
> 
> Probably, as that function sees more widespread use, we might want a
> non-inlined variant that can be used in places where performance doesn't
> matter all that much (although I am not sure there will be that many).

a quick test.

before:
   text    data     bss     dec     hex filename
  12380     470       0   12850    3232 mm/madvise.o
  52975    2689      24   55688    d988 mm/memory.o
  25305    1448    2096   28849    70b1 mm/mempolicy.o
   8573     924       4    9501    251d mm/mlock.o
  20950    5864      16   26830    68ce mm/rmap.o

 (120183)

after:

   text    data     bss     dec     hex filename
  11916     470       0   12386    3062 mm/madvise.o
  52990    2697      24   55711    d99f mm/memory.o
  25161    1448    2096   28705    7021 mm/mempolicy.o
   8381     924       4    9309    245d mm/mlock.o
  20806    5864      16   26686    683e mm/rmap.o

 (119254)

so uninlining saves a kilobyte of text - less than I expected but
almost 1%.

Quite a lot of the inlines in internal.h could do with having a
critical eye upon them.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.