Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Finding string for patterns

by cazz (Pilgrim)
on Apr 05, 2005 at 21:58 UTC ( [id://445115]=note: print w/replies, xml ) Need Help??


in reply to Re: Finding string for patterns
in thread Finding string for patterns

That may be fast for short strings, but given longer strings, it is much faster to precompute and then compare.
use Benchmark qw/:all/; use strict; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; foreach my $number ($MIN_NUMBER..$MAX_NUMBER){ my $square = $number * $number; $SQUARE->{$square} = $number; } my $string = q(5973821902497150366459738219024971503664); sub m_re { my $total_length; foreach my $square (keys %{$SQUARE}){ if($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_index { my $total_length; foreach my $square (keys %{$SQUARE}){ if(index($string, $square) > 0) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_sqrt { my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } cmpthese(1000, { "Regexp", \&m_re, "Index", \&m_index, "Sqrt", \&m_sqrt, });
Gives:
Benchmark: timing 1000 iterations of Index, Regexp, Sqrt...
     Index:  1 wallclock secs ( 0.98 usr +  0.01 sys =  0.99 CPU) @ 1010.10/s (n=1000)
    Regexp: 13 wallclock secs (12.20 usr +  0.02 sys = 12.22 CPU) @ 81.83/s (n=1000)
      Sqrt:  2 wallclock secs ( 2.27 usr +  0.01 sys =  2.28 CPU) @ 438.60/s (n=1000)
         Rate Regexp   Sqrt  Index
Regexp 81.8/s     --   -81%   -92%
Sqrt    439/s   436%     --   -57%
Index  1010/s  1134%   130%     --
(update: useless use of sub)

Replies are listed 'Best First'.
Re^3: Finding string for patterns
by Roy Johnson (Monsignor) on Apr 05, 2005 at 22:33 UTC
    It's a tradeoff. As the OP noted, if you increase the range of squares you're checking, precomputing gets slower (and it's time you don't count in your benchmark). Stick another 0 on MAX_NUMBER and run your bench again (change the first arg for cmpthese to -10), and you might get something like I did:
    Rate Regexp Index Sqrt Regexp 3.82/s -- -88% -98% Index 32.4/s 748% -- -81% Sqrt 167/s 4281% 417% --
    One relatively easy optimization that I didn't do is to limit the length to what can actually be matched:
    sub m_sqrt { my $total_length; my %seen; my $max_length = length($MAX_NUMBER * $MAX_NUMBER); for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; my $this_max = length($string)-$start; $this_max = $max_length if $max_length < $this_max; for my $length (1..$this_max) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); }
    Now the bench gives me
    Rate Regexp Index Sqrt Regexp 3.64/s -- -88% -99% Index 31.2/s 758% -- -94% Sqrt 497/s 13542% 1489% --

    Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://445115]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-07-24 21:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.