Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

by ikegami (Patriarch)
on Sep 14, 2007 at 02:05 UTC ( [id://638941]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

I fixed the in-place tests so they actually worked on the intended test data. And to make the benchmarks worth anything, I made inline those that weren't inline.

By the way, mrm_3 looped forever if $s1 contained a NUL. I fixed it by adding a + 1 to index's last arg.

#!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); use Test::More; # ==================== my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { split1 => sub { split1 ( my $s3 = $s1, $s2 ); $s3 }, substr1 => sub { substr1 ( my $s3 = $s1, $s2 ); $s3 }, moritz => sub { moritz ( my $s3 = $s1, $s2 ); $s3 }, corion => sub { corion ( my $s3 = $s1, $s2 ); $s3 }, ikegami1 => sub { ikegami1 ( my $s3 = $s1, $s2 ); $s3 }, ikegami2 => sub { ikegami2 ( my $s3 = $s1, $s2 ); $s3 }, bart => sub { bart ( my $s3 = $s1, $s2 ); $s3 }, ikegami3 => sub { ikegami3 ( my $s3 = $s1, $s2 ); $s3 }, ikegami4 => sub { ikegami4 ( my $s3 = $s1, $s2 ); $s3 }, avar2 => sub { avar2 ( my $s3 = $s1, $s2 ); $s3 }, avar2_pos => sub { avar2_pos ( my $s3 = $s1, $s2 ); $s3 }, mrm_1 => sub { mrm_1 ( my $s3 = $s1, $s2 ); $s3 }, mrm_3 => sub { mrm_3 ( my $s3 = $s1, $s2 ); $s3 }, mrm_4 => sub { mrm_4 ( my $s3 = $s1, $s2 ); $s3 }, mrm_5 => sub { mrm_5 ( my $s3 = $s1, $s2 ); $s3 }, mrm_6 => sub { mrm_6 ( my $s3 = $s1, $s2 ); $s3 }, }; { plan 'tests' => scalar keys %{$subs}; my $expected; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $expected ) { is( $sub->(), $expected, "$subname gets same value" ); } else { $expected = $sub->(); ok( defined $expected, "$subname gets some value" ); } } print("done.\n"); } cmpthese( -3, $subs ); # ==================== sub split1 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } $s1 = join '', @s1; } sub substr1 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; for my $idx ( 0 .. length($s1) ) { if ( substr($s1, $idx,1) eq chr(0) ) { substr($s1, $idx, 1) = substr($s2, $idx, 1); } } } sub moritz { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; my $pos = 0; while ( -1 < ( $pos = index $s1, "\000", $pos ) ) { substr( $s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } } sub ikegami1 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; (my $mask = $s1) =~ s/[^\x00]/\xFF/g; $s1 = ($s1 & $mask) | ($s2 & ~$mask); } sub ikegami2 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; (my $mask = $s1) =~ tr/\x00/\xFF/c; $s1 = ($s1 & $mask) | ($s2 & ~$mask); } sub bart { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; (my $mask = $s1) =~ tr/\x00/\xFF/c; $s1 = (($s1 ^ $s2) & $mask) ^ $s2; } sub ikegami3 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; use bytes; (my $mask = $s1) =~ tr/\x00/\xFF/c; $s1 = (($s1 ^ $s2) & $mask) ^ $s2; } sub ikegami4 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; use bytes; (my $mask = $s1) =~ tr/\x00/\xFF/c; $s1 ^= $s2; $s1 &= $mask; $s1 ^= $s2; } sub corion { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; my $ofs = 0; $s1 = join "", map { $ofs += length; $_ => substr $s2, $ofs++, 1 } +split /\0/, $s1, -1; } sub avar2 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; use bytes; $s1 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; } sub avar2_pos { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; use bytes; $s1 =~ s/\0/substr $s2, pos($s1), 1/eg; } sub mrm_1 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; # from [moritz]'s work use bytes; my $pos = 0; while ( -1 < ( $pos = index $s1, "\x00", $pos ) ) { substr( $s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } } sub mrm_3 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; # from moritz's, builds a separate loop of zeros use bytes; my @zeros = (); my $pos = 0; while ( -1 < ( $pos = index $s1, "\x00", $pos+1 ) ) { # ikegam +i: Added necessary +1 push @zeros, $pos; } for ( @zeros ) { substr( $s1, $_, 1 ) = substr( $s2, $_, 1 ); } } sub mrm_4 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; # from [bart]'s vec() use bytes; my $pos = 0; while ( -1 < ( $pos = index $s1, "\x00", $pos ) ) { vec( $s1, $pos, 8 ) ||= vec( $s2, $pos, 8 ); } } sub mrm_5 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; # from moritz's, seeing if four-arg substr() is faster or slower t +han lvalue substr() use bytes; my $pos = 0; while ( -1 < ( $pos = index $s1, "\x00", $pos ) ) { substr( $s1, $pos, 1, substr( $s2, $pos, 1 ) ); } } sub mrm_6 { our $s1; local *s1 = \$_[0]; our $s2; local *s2 = \$_[1]; # from mrn_5, testing bytes::misc explicitly instead of importing use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $s1, "\x00", $pos ) ) ) { bytes::substr( $s1, $pos, 1, bytes::substr( $s2, $pos, 1 ) ); } } sub do_rand { my $min = shift; my $len = shift; { my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } redo if $min == 0 && $n !~ /\X00/; return $n; } }
1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - ikegami4 gets same value ok 4 - mrm_1 gets same value ok 5 - mrm_4 gets same value ok 6 - bart gets same value ok 7 - mrm_3 gets same value ok 8 - split1 gets same value ok 9 - moritz gets same value ok 10 - mrm_5 gets same value ok 11 - avar2_pos gets same value ok 12 - ikegami1 gets same value ok 13 - ikegami2 gets same value ok 14 - ikegami3 gets same value ok 15 - mrm_6 gets same value ok 16 - avar2 gets same value done. Rate split1 substr1 ikegami1 corion avar2 mrm_6 avar2_pos + ikegami2 bart ikegami3 mrm_3 mrm_4 moritz ikegami4 mrm_1 mrm_5 split1 1.08/s -- -96% -97% -100% -100% -100% -100% + -100% -100% -100% -100% -100% -100% -100% -100% -100% substr1 28.0/s 2505% -- -22% -94% -94% -95% -96% + -97% -97% -97% -98% -98% -98% -99% -99% -99% ikegami1 36.1/s 3252% 29% -- -92% -92% -93% -95% + -96% -97% -97% -98% -98% -98% -98% -98% -99% corion 465/s 43105% 1558% 1189% -- -3% -12% -37% + -46% -56% -57% -70% -74% -75% -75% -78% -83% avar2 479/s 44448% 1610% 1229% 3% -- -9% -35% + -44% -55% -55% -69% -73% -74% -75% -78% -82% mrm_6 526/s 48782% 1776% 1358% 13% 10% -- -29% + -38% -50% -51% -66% -71% -71% -72% -76% -80% avar2_pos 742/s 68898% 2548% 1959% 60% 55% 41% -- + -13% -30% -31% -52% -59% -59% -61% -65% -72% ikegami2 855/s 79348% 2949% 2270% 84% 78% 63% 15% + -- -19% -20% -44% -53% -53% -55% -60% -68% bart 1054/s 97924% 3662% 2825% 127% 120% 101% 42% + 23% -- -1% -31% -42% -42% -44% -51% -61% ikegami3 1070/s 99384% 3718% 2868% 130% 123% 104% 44% + 25% 1% -- -30% -41% -41% -43% -50% -60% mrm_3 1532/s 142352% 5367% 4150% 230% 220% 191% 106% + 79% 45% 43% -- -15% -16% -19% -29% -43% mrm_4 1804/s 167619% 6337% 4904% 288% 276% 243% 143% + 111% 71% 69% 18% -- -1% -5% -16% -33% moritz 1822/s 169343% 6403% 4955% 292% 280% 247% 146% + 113% 73% 70% 19% 1% -- -4% -15% -32% ikegami4 1891/s 175685% 6647% 5144% 307% 295% 260% 155% + 121% 79% 77% 23% 5% 4% -- -12% -29% mrm_1 2150/s 199814% 7573% 5864% 363% 349% 309% 190% + 152% 104% 101% 40% 19% 18% 14% -- -20% mrm_5 2675/s 248608% 9446% 7320% 476% 458% 409% 260% + 213% 154% 150% 75% 48% 47% 41% 24% --

Replies are listed 'Best First'.
Re^8: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by mr_mischief (Monsignor) on Sep 14, 2007 at 14:39 UTC
    Okay, now that tests and benchmarks are working (thanks, ikegami), I'll do my comparative runs and show the differences between the different perl builds on this.

    The fastest pure-Perl solution in every case seems to be mrm_5, and usually by a decent margin. That's a version of moritz's code slightly tweaked with some of the tips taken from avar's code. moritz's own code (as tweaked for in-place fairness by ikegami) seems to be near the top quite consistently. So if pure Perl speed is the goal, moritz seems to have been on the right track all along. As my code for mrm_5 is just variations on improving his, I think he should get most of the credit for it (especially since he didn't go stupid halfway through and break it like I did).

    Strawberry 5.8.8 on WinXP, AthlonXP 2400+, 1GB

    ActivePerl 5.8.0 on same machine Died during tests
    1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - mrm_1 gets same value ok 4 - mrm_4 gets same value ok 5 - split1 gets same value ok 6 - moritz gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami1 gets same value ok 9 - ikegami2 gets same value ok 10 - ikegami3 gets same value ok 11 - ikegami4 gets same value ok 12 - bart gets same value ok 13 - mrm_3 gets same value ok 14 - mrm_5 gets same value
    cygperl 5.8.7 on same machine perl 5.8.7 on Mandriva 2006, Athlon 1000, 512 MB perl 5.9.5 (gcc 4.0.1, -O4 (4.0.1-5mdk)) on the above Mandriva

    Just for kicks: miniperl 5.9.5 on Mandriva

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://638941]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2024-03-28 11:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found