Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Challenge: fastest count of substring

by LanX (Chancellor)
on May 06, 2013 at 15:17 UTC ( #1032323=perlquestion: print w/replies, xml ) Need Help??
LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

inspired from the recent discussion I was seeking for the fastest way to count all fix substring in a long string.

But most approaches have the problem to allocate memory for a result list which is ignore anyway ...

DB<100> $l.="$_\t".(int rand 2)."\n" for 1..1e6 # linenumber TAB 0| +|1 RET DB<101> length $l => 8888896 DB<102> use Time::HiRes qw/time/ DB<103> sub timeit (&) { my $s=time; (shift)->(); print "<",time-$s.">\n"; } DB<104> $i=0; timeit { $i++ while ( $l =~ /\t1\n/g) }; print $i <0.454659938812256> 500338 DB<105> $i=0; timeit { $i = () = $l =~ /\t1\n/g }; print $i <1.18111300468445> 500338 DB<106> $i=0; timeit { $i = split /\t1\n/, $l }; print $i <0.507856845855713> 500339 DB<122> $i=0; timeit { local $/="\t1\n";open my $D,'<',\$l; $i++ whil +e (<$D>) }; print $i <0.703027963638306> 500339

Improvements welcome!

Cheers Rolf

( addicted to the Perl Programming Language)

Replies are listed 'Best First'.
Re: Challenge: fastest count of substring
by BrowserUk (Pope) on May 06, 2013 at 16:15 UTC

    index by a gnat's todger:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; sub timeit (&) { my $s=time; (shift)->(); print "<", time - $s . ">"; } our $l .= "$_\t" . (int rand 2) . "\n" for 1..1e6; # linenumber TAB 0 +||1 RET my $i=0; timeit { $i++ while ( $l =~ /\t1\n/g) }; print $i; $i=0; timeit { $i = () = $l =~ /\t1\n/g }; print $i; $i=0; timeit { $i = split /\t1\n/, $l }; print $i; $i=0; timeit { local $/="\t1\n";open my $D,'<',\$l; $i++ while (<$D>) +}; print $i; $i = 0; timeit { my $p = 0; ++$i while $p = 1 + index $l, "\t1\n", $p; }; print $i; __END__ C:\test>1032323 Use of implicit split to @_ is deprecated at C:\test\1032323.pl line 1 +7. <0.0966269969940186> 500737 <3.00464200973511> 500737 <0.204351902008057> 500738 <0.118543148040771> 500738 <0.0931870937347412> 500737

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Challenge: fastest count of substring
by Limbic~Region (Chancellor) on May 06, 2013 at 15:40 UTC
Re: Challenge: fastest count of substring
by hdb (Prior) on May 06, 2013 at 20:46 UTC

    I have been experimenting with various regexes until I observed that the various proposals give different answers plus/minus 1. I assume your challenge would require that the answer is correct, right?

      Sorry I thought it's obvious, any technique based on separation at delimiters results in an incremented result, so

       $x =split /;/, $string means their are $x-1 ";" in string.

      edit

      IMHO the only chance to get faster than now is to find an obscure feature in unpack.

      update

      example:

      DB<103> $_="aXaXa" => "aXaXa" DB<104> $x=split /X/ => 3 DB<105> split /X/ => ("a", "a", "a")

      ooops! 8-/

      DB<106> $_="XaX" => "XaX" DB<107> $x=split /X/ => 2 DB<108> split /X/ => ("", "a")

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1032323]
Approved by Old_Gray_Bear
help
Chatterbox?
[Corion]: Yay. Traditional finance situation averted. Bonds can be quoted in amounts (1_000_000 EUR) or per unit (1 unit). And a traditional error is to trade 2_000_000 piece when you meant to trade 2_000_000 EUR.
[Corion]: (one of my scripts simply catches high amounts and I phone people making that trade, ideally before the payment is due)
[Corion]: The sad thing is that my script sits at the end of the pipeline and can only look at the payments due today or tomorrow basically, while there are many more systems further up in the pipeline
[GotToBTru]: better late than never, I guess
[Corion]: GotToBTru: Sure - there is a long and sad story of many frantic cleanups that led us to implement this notification ;)

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (10)
As of 2017-03-29 11:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (347 votes). Check out past polls.