I'm actually curious how my multi-substr variant would perform in your bench... because in my tests (on two different machines), and much to my surprise, mine significantly outperforms the join-map-substr.
anomalous_map: 20 wallclock secs (17.09 usr + 0.16 sys = 17.25 CPU) @
+ 2.03/s (n=35)
anomalous_two: 17 wallclock secs (15.12 usr + 0.08 sys = 15.20 CPU) @
+ 2.04/s (n=31)
buk_substr: 16 wallclock secs (15.59 usr + 0.01 sys = 15.61 CPU) @ 0
+.70/s (n=11)
with_chop_fixed: 16 wallclock secs (15.09 usr + 0.17 sys = 15.27 CPU)
+ @ 2.23/s (n=34)
with_split: 16 wallclock secs (15.19 usr + 0.00 sys = 15.19 CPU) @ 0
+.86/s (n=13)
with_substr: 15 wallclock secs (15.39 usr + 0.00 sys = 15.39 CPU) @
+1.75/s (n=27)
Rate buk_substr with_split with_substr anomalous_ma
+p anomalous_two with_chop_fixed
buk_substr 0.705/s -- -18% -60% -65
+% -65% -68%
with_split 0.856/s 21% -- -51% -58
+% -58% -62%
with_substr 1.75/s 149% 105% -- -14
+% -14% -21%
anomalous_map 2.03/s 188% 137% 16% -
+- -0% -9%
anomalous_two 2.04/s 189% 138% 16% 0
+% -- -8%
with_chop_fixed 2.23/s 216% 160% 27% 10
+% 9% --
anomalous_map: 16 wallclock secs (15.34 usr + 0.06 sys = 15.41 CPU) @
+ 1.49/s (n=23)
anomalous_two: 15 wallclock secs (14.98 usr + 0.02 sys = 15.00 CPU) @
+ 1.60/s (n=24)
buk_substr: 17 wallclock secs (16.56 usr + 0.03 sys = 16.59 CPU) @ 0
+.30/s (n=5)
with_chop_fixed: 15 wallclock secs (15.22 usr + 0.02 sys = 15.23 CPU)
+ @ 1.51/s (n=23)
with_split: 17 wallclock secs (17.41 usr + 0.00 sys = 17.41 CPU) @ 0
+.34/s (n=6)
with_substr: 15 wallclock secs (15.50 usr + 0.00 sys = 15.50 CPU) @
+1.29/s (n=20)
Rate buk_substr with_split with_substr anomalous_ma
+p with_chop_fixed anomalous_two
buk_substr 0.301/s -- -13% -77% -80
+% -80% -81%
with_split 0.345/s 14% -- -73% -77
+% -77% -78%
with_substr 1.29/s 328% 274% -- -14
+% -15% -19%
anomalous_map 1.49/s 395% 333% 16% -
+- -1% -7%
with_chop_fixed 1.51/s 401% 338% 17% 1
+% -- -6%
anomalous_two 1.60/s 431% 364% 24% 7
+% 6% --
And yes, the anomalous_map, anomalous_two and with_chop_fixed seemed to go back and forth a lot, even on the first system. It's not had a lot of runs, so the timing's not very accurate. Trying with 200x200 on the second machine, to give it more runs per function:
Rate buk_substr with_split with_substr with_chop_fix
+ed anomalous_map anomalous_two
buk_substr 29.1/s -- -15% -77% -7
+8% -78% -80%
with_split 34.3/s 18% -- -73% -7
+4% -74% -76%
with_substr 127/s 337% 271% -- -
+2% -5% -12%
with_chop_fixed 130/s 348% 280% 3%
+-- -3% -10%
anomalous_map 134/s 360% 291% 5%
+3% -- -8%
anomalous_two 145/s 399% 323% 14% 1
+1% 8% --
|