Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Determining uniqueness in a string.

by thundergnat (Deacon)
on Oct 01, 2005 at 01:14 UTC ( [id://496577]=note: print w/replies, xml ) Need Help??


in reply to Re: Determining uniqueness in a string.
in thread Determining uniqueness in a string.

Ah crap, you beat me to it. :)

Update with a fourth sub based on BrowserUK's suggestion. Gets another 2-5% over mine.

Oops, fixed error in u2 that made it look about 10% faster than it was. New timings too.

use Benchmark qw( cmpthese ) ; my $digits = 2491306578; cmpthese( -5, { u1 => sub{ for ( 0 .. 9 ) { return 0 if $digits !~ /$_/;} return 1 +; }, u2 => sub{ my @x = sort split //, $digits; my $x = join "", @x; $x =~ tr/0-9//s; length $x == 10 ? return 1 : return 0; }, u3 => sub { return 0 if $digits =~ /(\d).*\1/; return 1; }, u4 => sub { return $digits !~ /(.).*\1/; }, } ); __END__
All unique: Rate u1 u2 u3 u4 u1 15504/s -- -75% -95% -95% u2 62910/s 306% -- -78% -78% u3 289343/s 1766% 360% -- -1% u4 291910/s 1783% 364% 1% -- Not Unique: Rate u1 u2 u3 u4 u1 17396/s -- -72% -98% -98% u2 62887/s 262% -- -92% -92% u3 748300/s 4202% 1090% -- -5% u4 783739/s 4405% 1146% 5% --

Replies are listed 'Best First'.
Re^3: Determining uniqueness in a string.
by creamygoodness (Curate) on Oct 01, 2005 at 05:40 UTC
    Precompiling regexes can make a dramatic difference. Compare the stats below for u1 vs u5 -- there's an 868% speedup on my box!

    Nevertheless, the backreference is just a better algo, and it still wins.

    use Benchmark qw( cmpthese ) ; my $digits = 2491306578; my @regexes = map { qr/$_/ } 0 .. 9; cmpthese( -5, { u1 => sub{ for ( 0 .. 9 ) { return 0 if $digits !~ /$_/; } return 1; }, u2 => sub{ my @x = sort split //, $digits; my $x = join "", @x; $x =~ tr/0-9//s; length $x == 10 ? return 1 : return 0; }, u3 => sub { return 0 if $digits =~ /(\d).*\1/; return 1; }, u4 => sub { return $digits !~ /(.).*\1/; }, u5 => sub { for (@regexes) { return 0 if $digits !~ $_; } return 1; }

    Rate u1 u2 u5 u3 u4 u1 10555/s -- -71% -90% -93% -93% u2 35955/s 241% -- -65% -76% -76% u5 102215/s 868% 184% -- -32% -32% u3 149240/s 1314% 315% 46% -- -1% u4 150091/s 1322% 317% 47% 1% --
    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com
      You're missing the closing }) (perhaps a copy/paste error?)

      I also don't care for the assumption that it's exactly 10 characters when it gets to uniq. If that requirement changes, uniq becomes broken without being modified. While this is perhaps simplistic, the fact that the code is separated means it's easy to update one without seeing the other, and it quietly fails. uniq would be better named pannumeric.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re^3: Determining uniqueness in a string.
by Anonymous Monk on Oct 01, 2005 at 01:53 UTC
    Thanks for the benchmark numbers... I plugged each into my program and the speed improvement was pretty impressive. Now I will spend a while understanding what those code fragments do. :)

      Actually, the print statement is probably much more expensive than any of those comparisons.

      Anyway, I noodled with the script a bit.

      #!/usr/bin/perl use warnings; use strict; while ( <> ) { chomp; (/^\d{10}$/) ? print uniq($_)+0,"\n" : die "Not 10 digits.\n"; } sub uniq { return $_[0] !~ /(.).*\1/ }
      Oops, I wasn't logged in... :)

      The total running time for my program has dropped from 77m18s to 24m14s. I will of course keep working on it. Thanks for the help!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://496577]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (8)
As of 2024-03-28 09:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found