[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.



Hi,

On 26/04/2022 16:37, Paran Lee wrote:
Thanks you, I agreed! It made me think once more about what my patch
could improve.
patches I sent have been reviewed in various ways. It was a good
opportunity to analyze my patch from various perspectives. :)

I checked objdump in -O2 optimization(default) of Xen Makefile to make
sure CSE (Common subexpression elimination) works well on the latest
arm64 cross compiler on x86_64 from  Arm GNU Toolchain.

$
~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
-v
...
A-profile Architecture 10.3-2021.07 (arm-10.29)'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
Architecture 10.3-2021.07 (arm-10.29)

I compared the before and after my patches. This time, without adding a
"pages" variable, I proceeded to use the local variable mask with order
operation.

I was able to confirm that it does one less operation.

Well... I don't think the one less operation is because of introduction of the local variable (see more below).


(1) before clean up

0000000000001bb4 <p2m_set_entry>:
     while ( nr )
     1bb4:       b40005e2        cbz     x2, 1c70 <p2m_set_entry+0xbc>
{
     ...
         if ( rc )
     1c1c:       350002e0        cbnz    w0, 1c78 <p2m_set_entry+0xc4>
         sgfn = gfn_add(sgfn, (1 << order));

1 << order is a 32-bit value but the second parameter is a 64-bit value (assuming arm64). So...

     1c20:       1ad32373        lsl     w19, w27, w19   // <<< CES works
     1c24:       93407e73        sxtw    x19, w19        // <<< well!

... this instruction is extending the 32-bit value to 64-bit value.

     return _gfn(gfn_x(gfn) + i);
     1c28:       8b1302d6        add     x22, x22, x19
     return _mfn(mfn_x(mfn) + i);
     1c2c:       8b130281        add     x1, x20, x19
     1c30:       b100069f        cmn     x20, #0x1
     1c34:       9a941034        csel    x20, x1, x20, ne  // ne = any
     while ( nr )
     1c38:       eb1302b5        subs    x21, x21, x19
     1c3c:       540001e0        b.eq    1c78 <p2m_set_entry+0xc4>  // b.none

(2) Using again mask variable. mask = 1UL << order
code show me   sxtw    x19, w19    operation disappeared.
This code is not only using a local variable but also using "1UL". So, I suspect that if you were using 1 << order, the instruction would re-appear.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.