Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Challenge: fastest count of substring

by LanX (Chancellor)
on May 06, 2013 at 15:17 UTC ( #1032323=perlquestion: print w/replies, xml ) Need Help??
LanX has asked for the wisdom of the Perl Monks concerning the following question:


inspired from the recent discussion I was seeking for the fastest way to count all fix substring in a long string.

But most approaches have the problem to allocate memory for a result list which is ignore anyway ...

DB<100> $l.="$_\t".(int rand 2)."\n" for 1..1e6 # linenumber TAB 0| +|1 RET DB<101> length $l => 8888896 DB<102> use Time::HiRes qw/time/ DB<103> sub timeit (&) { my $s=time; (shift)->(); print "<",time-$s.">\n"; } DB<104> $i=0; timeit { $i++ while ( $l =~ /\t1\n/g) }; print $i <0.454659938812256> 500338 DB<105> $i=0; timeit { $i = () = $l =~ /\t1\n/g }; print $i <1.18111300468445> 500338 DB<106> $i=0; timeit { $i = split /\t1\n/, $l }; print $i <0.507856845855713> 500339 DB<122> $i=0; timeit { local $/="\t1\n";open my $D,'<',\$l; $i++ whil +e (<$D>) }; print $i <0.703027963638306> 500339

Improvements welcome!

Cheers Rolf

( addicted to the Perl Programming Language)

Replies are listed 'Best First'.
Re: Challenge: fastest count of substring
by BrowserUk (Pope) on May 06, 2013 at 16:15 UTC

    index by a gnat's todger:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; sub timeit (&) { my $s=time; (shift)->(); print "<", time - $s . ">"; } our $l .= "$_\t" . (int rand 2) . "\n" for 1..1e6; # linenumber TAB 0 +||1 RET my $i=0; timeit { $i++ while ( $l =~ /\t1\n/g) }; print $i; $i=0; timeit { $i = () = $l =~ /\t1\n/g }; print $i; $i=0; timeit { $i = split /\t1\n/, $l }; print $i; $i=0; timeit { local $/="\t1\n";open my $D,'<',\$l; $i++ while (<$D>) +}; print $i; $i = 0; timeit { my $p = 0; ++$i while $p = 1 + index $l, "\t1\n", $p; }; print $i; __END__ C:\test>1032323 Use of implicit split to @_ is deprecated at C:\test\ line 1 +7. <0.0966269969940186> 500737 <3.00464200973511> 500737 <0.204351902008057> 500738 <0.118543148040771> 500738 <0.0931870937347412> 500737

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Challenge: fastest count of substring
by Limbic~Region (Chancellor) on May 06, 2013 at 15:40 UTC
Re: Challenge: fastest count of substring
by hdb (Monsignor) on May 06, 2013 at 20:46 UTC

    I have been experimenting with various regexes until I observed that the various proposals give different answers plus/minus 1. I assume your challenge would require that the answer is correct, right?

      Sorry I thought it's obvious, any technique based on separation at delimiters results in an incremented result, so

       $x =split /;/, $string means their are $x-1 ";" in string.


      IMHO the only chance to get faster than now is to find an obscure feature in unpack.



      DB<103> $_="aXaXa" => "aXaXa" DB<104> $x=split /X/ => 3 DB<105> split /X/ => ("a", "a", "a")

      ooops! 8-/

      DB<106> $_="XaX" => "XaX" DB<107> $x=split /X/ => 2 DB<108> split /X/ => ("", "a")

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1032323]
Approved by Old_Gray_Bear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2017-08-23 23:27 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (360 votes). Check out past polls.