Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

The fastest way of searching a certain element in an array

by ccn (Vicar)
on Apr 10, 2004 at 17:18 UTC ( #344143=perlmeditation: print w/ replies, xml ) Need Help??

Hi Monks,

There are many ways to solve a question "Is this element in the list?" It doesn't matter how the way you chosen if a list is small. But choosing the right way is very important if a list is huge.

Looking for the fastest method I've made benchmarks for some of them. Now I can say that there are two best ways exist. They are based on foreach & grep.

#... grep_line => sub { grep { $_ eq TEST and return 1 } @Array; return; }, for_line => sub { TEST eq $_ and return 1 foreach @Array; return; } #...

As Abigail-II said, foreach wins when a match occcures in the first one-third of an array but grep better when a match occcures in the last one-third.

P.S. I did not consider map because I don't beleive that it could be better than grep.

UPDATE: 4 Oct 2004, I've compared a List::Util::first function with "grep_line" and "for_line". I found that List::Util::first is slower.

#!/usr/bin/perl -w use strict; use Benchmark; use constant TEST => 'match'; use constant SIZE => 100000; my @pos = (0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 0.9); my @Array; my %subs = ( grep_block => sub { GREP: { grep { $_ eq TEST and next GREP } @Array; return; } continue { return 1; } }, grep_line => sub { grep { $_ eq TEST and return 1 } @Array; return; }, grep_simple => sub { return grep { $_ eq TEST } @Array; }, for_block => sub { foreach (@Array) { return 1 if TEST eq $_; } return; }, for_line => sub { TEST eq $_ and return 1 foreach @Array; return; } ); foreach (@pos) { my $pos = SIZE * $_; printf "Matching position %d from %d i.e. (%0.2f)\n" , $pos, SIZE + , $pos/SIZE; @Array = ( ('x') x ($pos - 1), TEST, ('x') x (SIZE - $pos) ); Benchmark::cmpthese( 1000, \%subs ); }
Matching position 5000 from 100000  i.e. (0.05)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate grep_simple  grep_block   grep_line   for_block    for_line
grep_simple 11.1/s          --        -79%        -91%        -95%        -95%
grep_block  52.3/s        372%          --        -60%        -75%        -77%
grep_line    130/s       1073%        148%          --        -39%        -44%
for_block    212/s       1814%        305%         63%          --         -8%
for_line     231/s       1983%        341%         78%          9%          --


Matching position 10000 from 100000  i.e. (0.10)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate grep_simple  grep_block   grep_line   for_block    for_line
grep_simple 11.1/s          --        -60%        -86%        -88%        -89%
grep_block  28.0/s        153%          --        -65%        -70%        -73%
grep_line   80.6/s        628%        187%          --        -15%        -21%
for_block   94.8/s        756%        238%         18%          --         -8%
for_line     103/s        827%        266%         27%          8%          --


Matching position 20000 from 100000  i.e. (0.20)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate grep_simple  grep_block   grep_line   for_block    for_line
grep_simple 11.1/s          --        -24%        -76%        -76%        -78%
grep_block  14.5/s         31%          --        -68%        -69%        -71%
grep_line   45.6/s        312%        214%          --         -1%         -9%
for_block   46.3/s        318%        219%          1%          --         -8%
for_line    50.1/s        352%        245%         10%          8%          --


Matching position 40000 from 100000  i.e. (0.40)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate  grep_block grep_simple   for_block   grep_line    for_line
grep_block  7.38/s          --        -33%        -68%        -70%        -71%
grep_simple 11.1/s         50%          --        -52%        -55%        -56%
for_block   23.2/s        214%        109%          --         -5%         -7%
grep_line   24.5/s        231%        121%          6%          --         -2%
for_line    25.1/s        239%        126%          8%          2%          --


Matching position 60000 from 100000  i.e. (0.60)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate  grep_block grep_simple   for_block   grep_line    for_line
grep_block  4.96/s          --        -55%        -68%        -70%        -70%
grep_simple 11.1/s        123%          --        -28%        -34%        -34%
for_block   15.5/s        212%         40%          --         -7%         -7%
grep_line   16.7/s        237%         51%          8%          --         -0%
for_line    16.7/s        237%         51%          8%          0%          --


Matching position 80000 from 100000  i.e. (0.80)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate  grep_block grep_simple   for_block    for_line   grep_line
grep_block  3.73/s          --        -66%        -68%        -70%        -70%
grep_simple 11.1/s        197%          --         -4%        -12%        -12%
for_block   11.6/s        211%          5%          --         -7%         -8%
for_line    12.5/s        236%         13%          8%          --         -0%
grep_line   12.6/s        237%         14%          9%          0%          --


Matching position 90000 from 100000  i.e. (0.90)
Benchmark: timing 1000 iterations of for_block, for_line, grep_block, grep_line, grep_simple...
              Rate  grep_block   for_block    for_line grep_simple   grep_line
grep_block  3.31/s          --        -67%        -70%        -70%        -71%
for_block   10.2/s        208%          --         -8%         -8%        -10%
for_line    11.0/s        233%          8%          --         -0%         -2%
grep_simple 11.1/s        234%          9%          0%          --         -2%
grep_line   11.3/s        242%         11%          2%          2%          --

Comment on The fastest way of searching a certain element in an array
Select or Download Code
Re: The fastest way of searching a certain element in an array
by kvale (Monsignor) on Apr 10, 2004 at 18:04 UTC
    Nice comparison. Your problem is basically testing for set membership, and that is often done using a hash if there are lots of searches:
    use Benchmark qw(:all) ; my $test_element = 42_000; my @a = (1..100_000); my $count = 20; cmpthese($count, { 'hash' => sub { my %seen; undef( @seen{ @a } ); # a trick I learned from t +illy my $found = 0; for my $iter (10_000..10_100) { # 101 searches $found = 1 unless exists $seen{ $iter }; } return $found; }, 'for' => sub { my $found = 0; for my $iter (10_000..10_100) { # 101 searches foreach (@a) { $found = 1 and last if $iter == $_; } } return $found; }, });
    This gives the results:
    Benchmark: timing 20 iterations of for, hash... for: 16 wallclock secs (15.67 usr + 0.00 sys = 15.67 CPU) @ 1 +.28/s (n=20) hash: 8 wallclock secs ( 8.57 usr + 0.04 sys = 8.61 CPU) @ 2 +.32/s (n=20) Rate for hash for 1.28/s -- -45% hash 2.32/s 82% --
    So hashes are faster for 101 searches of the same 100_000 element array; crossover is at about 70 searches on my machine.

    -Mark

      Yes, of course. The hash is fastest if a search is often. Your numbers are interesting.
      Update: these numbers help to understand what does the often mean
Re: The fastest way of searching a certain element in an array
by hardburn (Abbot) on Apr 10, 2004 at 18:41 UTC

    I'd still check a map solution, just to be sure.

    Also, your data is sorted, which means that a linear search on the data is not the most efficient algorithm. You should either implement a binary search, use an existing implementation, or keep your data unsorted.

    ----
    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      No, @Array is not sorted.

        Oops, you're right. I was reading off @pos. Must pay more attention to code.

        ----
        : () { :|:& };:

        Note: All code is untested, unless otherwise stated

      The point is still made that when doing repeated searches on a datastore, one must consider what kind of store it is. Sortedness, what data it's storing, what kind of searching facilities already exist for it ... there is never a "One Right Answer"tm for any computing question (though there are many "Less Than Perfect"tm answers).

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Re: The fastest way of searching a certain element in an array
by demerphq (Chancellor) on Apr 13, 2004 at 15:04 UTC

    Whats more pertinent in perl I think is the code to find the _index_ of the first matching element in the array. A straight forward existance test is too easily outperformed by a hash. and its unusual in my experience that if you need to this more than once that pre-constructing the hash isnt possible. BTW, id be interested to see if you come up with the fastest technique. Suprisingly it is actually one of the longer solutions.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


Re: The fastest way of searching a certain element in an array
by blakem (Monsignor) on Apr 14, 2004 at 23:49 UTC
    Try the first() function List::Util. Its like grep but returns as soon as it finds the first match.
    % perldoc List::Util first BLOCK LIST Similar to "grep" in that it evaluates BLOCK setting $_ to +each element of LIST in turn. "first" returns the first element +where the result from BLOCK is a true value. If BLOCK never retur +ns true or LIST was empty then "undef" is returned. $foo = first { defined($_) } @list # first defined v +alue in @list $foo = first { $_ > $value } @list # first value in +@list which # is greater than + $value

    -Blake

      Hmm..., surely it is not the fastest. Here is it's code:
      sub first (&@) { my $code = shift; foreach (@_) { return $_ if &{$code}(); } undef; }
      You can see, it is similar to my for_block which was benchmarked.
        ccn,
        Here is it's code:

        Well that depends. As I indicated in Getting Matching Items From An Array, List::Util falls back to pure perl (which has obvious inefficiencies) only if the XS version is unavailable. Which version of 'first' did you use in your benchmarks?

        Cheers - L~R

      Or even better List::MoreUtil::any, which is EXACTLY what you mean. Boolean yes/no, instead of something that might suffer autoboolification and end up false.
Re: The fastest way of searching a certain element in an array
by mlh2003 (Scribe) on Jan 18, 2005 at 11:28 UTC
    Out of curiosity, I also tried the benchmark script with a sub using the join and index functions as follows:
    my %subs = ( join_index => sub { my $joined = '#' . join('#', @Array) . '#'; return (index($joined, '#TEST#') >= 0) ? 1 : 0; }, grep_block => sub { GREP: { grep { $_ eq TEST and next GREP } @Array; return; } continue { return 1; } }, ... etc.
    It seemed to fare better when the match was towards the end of the 'joined' string (Fastest at matching positions 0.8 and 0.9 out of all the other subs on my machine). The interesting thing was that it yielded similar rates for all matching positions. It could be that the join function is expensive compared to the index function (likely), so the timing is swamped by joining the elements in the array. Which also indicates that the index function is FAST...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://344143]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-10-24 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (131 votes), past polls