[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/5] x86/pv: Optimise prefetching in svm_load_segs()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 10 Sep 2020 21:30:10 +0100
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Thu, 10 Sep 2020 20:30:31 +0000
  • Ironport-sdr: 6Csud8we96+LZIYvIWoetNeiQJHtcWgfbJliSY/vYVDzegMWk6rDHYKNo+RCMNYOvPc8KnPMfn bDhTt75WyeSU43JZOiJJnRQseRZcbVXj5DyvA9bqoexWEzHyxtVg59V1DQVL0H6Pvf4ECKlsaC uVi4W5Ox9Sf/yYNW3/r3V2+cL9IX2oQJ/3xaNoPAwDxqjFr64dbI8uiBDO3zqZ8K3gTFYrtOMe 6hNCniagnUIbWapBQGjVBzboImXceGfTobtVk0DZ0//fY3SX5cGWXjWcBHt/lsDkcAk0W5nBXO LYU=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10/09/2020 15:57, Jan Beulich wrote:
> On 09.09.2020 11:59, Andrew Cooper wrote:
>> Split into two functions.  Passing a load of zeros in results in somewhat 
>> poor
>> register scheduling in __context_switch().
> I'm afraid I don't understand why this would be, no matter that
> I trust you having observed this being the case: The registers
> used for passing parameters are all call-clobbered anyway, so
> the compiler can't use them for anything across the call. And
> it would look pretty poor code generation wise if the XORs to
> clear them (which effectively have no latency at all) would be
> scheduled far ahead of the call, especially when there's better
> use for the registers. The observation wasn't possibly from
> before your recent dropping of two of the parameters, when they
> couldn't all be passed in registers (albeit even then it would
> be odd, as the change then should merely have lead to a slightly
> smaller stack frame of the function)?

Hmm yes.  I wrote this patch before I did the assertion fix, and it the
comment didn't rebase very well.

Back then, one of the zeros was on the stack, which was definitely an
unwanted property.  Even though the XORs are mostly free, they're not
totally free, as they cost decode bandwidth and instruction cache space
(Trivial amounts, but still...).

In general, LTO's inter-procedural-analysis can figure out that
svm_load_segs_prefetch() doesn't use many registers, and the caller can
be optimised based on the fact that some registers aren't actually
clobbered.  (Then again, in this case with a sole caller, LTO really
ought to be able to inline and delete the function.)

How about "results in unnecessary caller setup code" ?

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.