[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 2/4] Map: backport find_opt/update from 4.06


  • To: Edwin Torok <edvin.torok@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Christian Lindig <christian.lindig@xxxxxxxxxx>
  • Date: Tue, 1 Sep 2020 12:39:19 +0000
  • Accept-language: en-GB, en-US
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxx>, "dave@xxxxxxxxxx" <dave@xxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>
  • Delivery-date: Tue, 01 Sep 2020 12:39:36 +0000
  • Ironport-sdr: M5ES8QIa+1iSvmtgqSeHuy61iDsFwPjxCNRO3huw3jspXRJ18HXE26J1fpDzmkSoT+3dTSrCBw ZqgMr709uBzmop3HtEKZ2TtIKqDudJ0asJR82uGZweq2x3lNDwxT03fs28srKK9LsC3HAYAw4v ge8f9fuKCX9qOG3wHe0gqpPbKPpzoAy+2vrGXmTWr1OeZ7Ef5C0XjbBQl/a9EtdlAFHl2ygvh4 dTsg1CZV6tCMvjcBURM6IFmj9YBKUtlhxKn1CCVpoX8S4iXAqVMngEIv6TjIRrGG2XPDNs4xzg v6A=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWfJh3uKZDw4ZJUUWej1tdtQi326lNL7ixgAB7CoCABhSSWw==
  • Thread-topic: [PATCH v4 2/4] Map: backport find_opt/update from 4.06

Thanks for this interesting experiment. My recollection was correct that 
exceptions are very fast but only if you don't capture stack traces (which we 
we do capture). With backtraces the difference is not massive. It still does 
pay off to use exceptions when they are raised rarely as the it saves the 
lookup for mem(). And moving to OCaml 4.06 improves performance.

-- C
________________________________________
From: Edwin Torok
Sent: 28 August 2020 18:43
To: Christian Lindig; xen-devel@xxxxxxxxxxxxxxxxxxxx
Cc: Ian Jackson; dave@xxxxxxxxxx; wl@xxxxxxx
Subject: Re: [PATCH v4 2/4] Map: backport find_opt/update from 4.06

On Fri, 2020-08-28 at 10:30 +0200, Christian Lindig wrote:
> +let find_opt k t =
>
> +       (* avoid raising exceptions, they can be expensive *)
>
> +       if mem k t then Some (find k t) else None
>
>
>
> I disagree with this argument. Exceptions in OCaml are cheap because
> they don't walk the stack and cut to the exception handler directly.
> Is there a reason why they could be expensive here? In any case, the
> code is correct.
>

Interesting question, it is best to measure.

The answer depends on whether the key is found or not in the map.
I'll change the impl to the one with find+catch, I think we  might be
looking up keys that are present more often than those that are missing
(although the 4.06 series will make this moot, 4.06 version is faster
than both approaches, regardless whether key is present or not).

To sum up the measurements below:
If the key is not found then my approach is faster (takes only 80% of
time), if the key is found then find+catch is faster (again an approx
80% of the time taken).

I based my comment on the documentation of raise_notrace, which says:
"A faster version raise which does not record the backtrace.",
which implies that recording the backtrace has a measurable perf
impact.
One could argue that if performance matters backtraces should be turned
off in production, but I think the value of having backtraces when some
hard-to-reprodue bug occurs outweights any perf penalty.
We should try to use exceptions only for unexpected situations though,
not finding a value in a map doesn't qualify.

See the attachment for a small micro-benchmark:
$ dune exec --profile=release -- ./updatet.exe raise
Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
┌───────────────┬──────────┬────────────┐
│ Name          │ Time/Run │ Percentage │
├───────────────┼──────────┼────────────┤
│ raise         │  33.52ns │    100.00% │
│ raise_notrace │  19.16ns │     57.16% │
└───────────────┴──────────┴────────────┘

So raising with a backtrace is measurably slower, taking the backtrace
spends some CPU cycles.

$ dune exec --profile=release -- ./updatet.exe find-opt
Estimated testing time 1m30s (9 benchmarks x 10s). Change using '-
quota'.
┌──────────────────────────┬──────────┬────────────┐
│ Name                     │ Time/Run │ Percentage │
├──────────────────────────┼──────────┼────────────┤
│ find_opt 4.06:10         │  49.10ns │     24.06% │
│ find_opt 4.06:100        │ 115.38ns │     56.55% │
│ find_opt 4.06:1000       │ 161.27ns │     79.03% │
│ find_opt=mem+find:10     │  50.48ns │     24.74% │
│ find_opt=mem+find:100    │ 110.39ns │     54.10% │
│ find_opt=mem+find:1000   │ 162.48ns │     79.63% │
│ find_opt=find+catch:10   │  89.10ns │     43.67% │
│ find_opt=find+catch:100  │ 160.80ns │     78.80% │
│ find_opt=find+catch:1000 │ 204.04ns │    100.00% │
└──────────────────────────┴──────────┴────────────┘


4.06 and mem+find take 80% of the time of find+catch.

But of course if the key is actually found in the map then we have
this:
edwin@edvin-tower:~/uex % dune exec --profile=release -- ./updatet.exe
find-opt-found
Estimated testing time 1m30s (9 benchmarks x 10s). Change using '-
quota'.
┌──────────────────────────┬──────────┬─────────┬────────────┐
│ Name                     │ Time/Run │ mWd/Run │ Percentage │
├──────────────────────────┼──────────┼─────────┼────────────┤
│ find_opt 4.06:10         │  38.38ns │   2.00w │     52.65% │
│ find_opt 4.06:100        │  20.66ns │   2.00w │     28.35% │
│ find_opt 4.06:1000       │  20.63ns │   2.00w │     28.30% │
│ find_opt=mem+find:10     │  72.89ns │   2.00w │    100.00% │
│ find_opt=mem+find:100    │  39.06ns │   2.00w │     53.59% │
│ find_opt=mem+find:1000   │  39.07ns │   2.00w │     53.60% │
│ find_opt=find+catch:10   │  49.54ns │   2.00w │     67.97% │
│ find_opt=find+catch:100  │  33.01ns │   2.00w │     45.29% │
│ find_opt=find+catch:1000 │  32.97ns │   2.00w │     45.23% │
└──────────────────────────┴──────────┴─────────┴────────────┘

In this case find+catch is faster.

And here is update for a key that is not present:
$ dune exec --profile=release -- ./updatet.exe update
Estimated testing time 1m30s (9 benchmarks x 10s). Change using '-
quota'.
┌───────────────────────────────────┬──────────┬─────────┬────────────┐
│ Name                              │ Time/Run │ mWd/Run │ Percentage │
├───────────────────────────────────┼──────────┼─────────┼────────────┤
│ update 4.06:10                    │  79.96ns │  24.00w │     17.27% │
│ update 4.06:100                   │ 171.96ns │  48.00w │     37.15% │
│ update 4.06:1000                  │ 243.95ns │  66.00w │     52.70% │
│ update=find+catch+add/remove:10   │ 183.46ns │  24.00w │     39.63% │
│ update=find+catch+add/remove:100  │ 340.00ns │  48.00w │     73.45% │
│ update=find+catch+add/remove:1000 │ 462.89ns │  66.00w │    100.00% │
│ update=mem+find+add/remove:10     │ 126.06ns │  24.00w │     27.23% │
│ update=mem+find+add/remove:100    │ 274.79ns │  48.00w │     59.36% │
│ update=mem+find+add/remove:1000   │ 401.62ns │  66.00w │     86.76% │
└───────────────────────────────────┴──────────┴─────────┴────────────┘

Here 4.06 is a clear win, and mem+add is faster than find+catch+add.
Estimated testing time 1m30s (9 benchmarks x 10s). Change using '-
quota'.
┌───────────────────────────────────┬──────────┬─────────┬────────────┐
│ Name                              │ Time/Run │ mWd/Run │ Percentage │
├───────────────────────────────────┼──────────┼─────────┼────────────┤
│ update 4.06:10                    │  72.76ns │  24.00w │     31.25% │
│ update 4.06:100                   │ 164.69ns │  48.00w │     70.74% │
│ update 4.06:1000                  │ 232.79ns │  66.00w │    100.00% │
│ update=find+catch+add/remove:10   │ 133.24ns │  23.00w │     57.24% │
│ update=find+catch+add/remove:100  │ 118.76ns │  35.00w │     51.02% │
│ update=find+catch+add/remove:1000 │ 161.22ns │  59.00w │     69.26% │
│ update=mem+find+add/remove:10     │ 156.29ns │  23.00w │     67.14% │
│ update=mem+find+add/remove:100    │ 122.98ns │  35.00w │     52.83% │
│ update=mem+find+add/remove:1000   │ 161.53ns │  59.00w │     69.39% │
└───────────────────────────────────┴──────────┴─────────┴────────────┘

Interestingly here the 4.06 implementation is actually slower and not
much difference between my other two.

Best regards,
--Edwin



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.