Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: using bits to print part of a string

by choroba (Abbot)
on Mar 15, 2013 at 16:53 UTC ( #1023740=note: print w/ replies, xml ) Need Help??


in reply to Re: using bits to print part of a string
in thread using bits to print part of a string

Anyone willing to improve the benchmark?

#!/usr/bin/perl use warnings; use strict; use feature qw/state/; use Test::More tests => 4; use Benchmark qw/cmpthese/; my $length = 100_000; my $string = join q(), map int rand 10, 1 .. $length; my $mask = join q(), map int rand 2, 1 .. $length; my $result = bitwise($string, $mask); is arraywise ($string, $mask), $result, 'array'; is substrwise($string, $mask), $result, 'substr'; is packwise ($string, $mask), $result, 'pack'; is regexwise ($string, $mask), $result, 'regex'; sub bitwise { my $string = shift; state $mask; ($mask = shift) =~ y/01/\x00\xff/ unless defined $mask; my $result = $string & $mask; $result =~ tr/\x00//d; return $result; } sub arraywise { my $string = shift; state $mask = shift; my @chars = split //, $string; return join q(), @chars[grep substr($mask, $_, 1), 0 .. length $ma +sk]; } sub regexwise { my $string = shift; state $regex; unless (defined $regex) { $regex = '^' . join(q(), map $_ ? '(.)' : '.', split //, shift +) . '$'; } my $result = join q(), $string =~ m/$regex/o; return $result; } sub packwise { my $string = shift; state $mask; unless (defined $mask) { $mask = shift; my $template; while ($mask =~ /((.)\2*)/g) { $template .= (qw(x a))[$2] . length $1; } $mask = $template; } return join q(), unpack $mask, $string; } sub substrwise { my $string = shift; state $mask; my @mask; unless (defined $mask) { $mask = shift; while ( $mask =~ /0+/g ) { push @mask, [ $-[0], ( $+[0] - $-[0] ) ]; } } for (1 .. @mask) { my $replace = ''; #'*' x $mask[-$_][1]; #check to see substr $string, $mask[-$_][0], $mask[-$_][1], $replace; } return $string; } cmpthese(-3, { bitwise => sub { bitwise ($string, $mask) }, arraywise => sub { arraywise ($string, $mask) }, substrwise => sub { substrwise($string, $mask) }, packwise => sub { packwise ($string, $mask) }, regexwise => sub { regexwise ($string, $mask) }, });

Update: Added regexwise.

لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ


Comment on Re^2: using bits to print part of a string
Select or Download Code
Re^3: using bits to print part of a string
by McA (Priest) on Mar 15, 2013 at 17:14 UTC

    On one of my machines:

    1..4 ok 1 - array ok 2 - substr ok 3 - pack ok 4 - regex Rate arraywise regexwise packwise bitwise substr +wise arraywise 38.9/s -- -75% -86% -98% - +100% regexwise 153/s 294% -- -43% -93% - +100% packwise 270/s 593% 76% -- -87% - +100% bitwise 2077/s 5239% 1255% 671% -- +-99% substrwise 187563/s 481985% 122274% 69480% 8930% + --
    I guessed that my solution is as slow as I am... ;-)

    McA

Re^3: using bits to print part of a string
by BrowserUk (Pope) on Mar 15, 2013 at 17:16 UTC
    Anyone willing to improve the benchmark?

    They couldn't make it much worse :)

    Incorporating one-off set-ups -- or even a test for one-off setups -- into the benchmark subs is like incorporating the build-time of a car in its race time.

    If the substr code is meant to reflect my second option, you've completely misunderstood the purpose of the substr refs and assigning through a fixed scalar buffer.

    Its also traditional to post the results of a typical run.

    I may have a go at producing a more realistic benchmark later tonight. Key ingredients are that you must not exclude the IO, buffer and memory handling when benchmarking IO processing of large files. Yours excludes all of these.

    Hint: You cannot do IO filter bechmarks using the Benchmark module. The only realistic test is to time processing actual files that are big enough that they do not fit into the filecache. And you must ensure that the cache is flushed between runs.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: using bits to print part of a string.(bug)
by BrowserUk (Pope) on Mar 16, 2013 at 04:19 UTC

    BTW: There is a problem with your substrwise test.

    The first time through, $mask is undefined, so you set up @mask and set smask.

    But on the second and subsequent times through, $mask is defined, so the non-state variable: @mask is left empty, so you don't do any actual work.

    That probably explains the surprising apparent efficiency of substrwise in the figures McA posted.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Aaaarrggghh, and I just started to change every regex substitution in my code to a combination of index and substr to get cutting edge performance... ;-)

      McA

        I just started to change every regex substitution in my code to a combination of index and substr to get cutting edge performance... ;-)

        Now you'll have to change them all to bitwise ops :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks for catching the problem.

      BTW, I had to change the condition in substrref to

      while $mask =~ /1+/g
      to get the same results as from the other methods.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^3: using bits to print part of a string (Bitops win by an order of magnitude)
by BrowserUk (Pope) on Mar 16, 2013 at 20:52 UTC

    I finally got around to benchmarking. (Removing the nulls, left by the bitwise&, in-place using tr is the saving grace!):

    #! perl -slw use strict; use Time::HiRes qw[ time ]; my @benches = ( sub { printf 'unpack: '; my $mask = shift; my $templ; while( $mask =~ /((.)\2*)/g ) { $templ .= (qw(x a))[$2] . length $1; } return sub { my $fh = shift; my $count = 0; my $out; $out = join'', unpack( $templ, $_ ), ++$count while <$fh>; $count; } }, sub { printf 'substr: '; my $mask = shift; my $templ; my @mask; while ( $mask =~ /0+/g ) { push @mask, [ $-[0], ( $+[0] - $-[0] ) ]; } return sub { my $fh = shift; my $count = 0; my $out; while( defined( $out = <$fh> ) ) { substr( $out, $mask[-$_][0], $mask[-$_][1],'' ) for 1 +.. @mask; ++$count; } $count; } }, sub { printf 'substrref: '; my $mask = shift; my $templ; my $buf = chr(0); $buf x= 400_000; my @refs; push @refs, \substr( $buf, $-[0], $+[0] - $-[0] ) wh +ile $mask =~ /0+/g; return sub { my $fh = shift; my $count = 0; my $out; while( <$fh> ) { substr( $buf, 0 ) = $_; $out = join'', map $$_, @refs; ++$count; } $count; } }, sub { printf "bitops: "; my $mask = shift; $mask =~ tr[01][\x00\xff]; return sub { my $fh = shift; my $count = 0; $_ &= $mask, tr[\x00][]d, ++$count while <$fh>; $count; } }, ); $|++; our $OPT //= 0; our $FLUSHFILE //= '10gb.csv'; our $TESTFILE //= '1023727.dat'; our $S //= 1; srand $S; my $mask = join '', map int( rand 2 ), 1 .. 400_000; open I, '<', $FLUSHFILE or die $!; 1 while <I>; close I; my $start = time; my $run = $benches[ $OPT ]->( $mask ); open I, '<', $TESTFILE or die $!; my $records = $run->( \*I ); close I; my $stop = time; printf "Took %f seconds for %u records (%f recs/second)\n", $stop - $start, $records, $records / ($stop - $start); __END__ C:\test>for /l %n in (0,1,3) do @1023727 -OPT=%n unpack: Took 164.702357 seconds for 2606 records (15.822482 recs/secon +d) substr: Took 2971.481218 seconds for 2606 records (0.877004 recs/secon +d) substrref: Took 154.501948 seconds for 2606 records (16.867101 recs/se +cond) bitops: Took 12.534998 seconds for 2606 records (207.897916 recs/secon +d)

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    div class=

      Hi,

      why have you thrown my regex solution out of the race? :) I've been interested to see it in your benchmark.

      McA

        sub { printf 'regex: '; my $mask = shift; my $re = '^' . join('', map $_ ? '(.)' : '.', split '', $mask +) . '$'; return sub { my $fh = shift; my $out; my $count = 0; $out = join( '', m[$re]o ), ++$count while <$fh>; $count; } }, __END__ C:\test>1023727 -OPT=4 regex: Took 274.157906 seconds for 2606 records (9.505471 recs/second)

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1023740]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2014-12-29 12:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (187 votes), past polls