Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Getting Matching Items From An Array

by Limbic~Region (Chancellor)
on Jul 05, 2004 at 16:51 UTC ( #371938=perltutorial: print w/ replies, xml ) Need Help??

Purpose
On a regular basis, people want to know if a particular thing, we will call it foo, is present in an array. The problem with just directing them to perldoc -q contain is that each question usually varies enough to make a single answer unsatisfactory. The goal of this tutorial is to give enough information so that someone can choose the right solution to the problem.

Is This Tutorial For You?
A typical mistake is to assume that there is a built-in way of determining presence in an array like there is with a hash. A red flag should go up if you try using a string for an index as in print "gotcha\n" if $array['foo']. Perhaps you should be using a hash. An array gives no "meaning" between the index and the value (though the programmer might as in the case of matrices). If you are still unsure which to be using, see perldoc perldsc and perldoc perldata for more information.

What Will Be Covered
There are typically 5 answers to the problem of identifying if foo is present in an array.

  • Convert the array to a hash
  • Use the built-in grep
  • Use List::Util's first
  • Use the smart match operator ~~ introduced in 5.10.0
  • IYAW (Invent Yet Another Wheel)
There are a handful of things to consider and each solution has pros and cons accordingly. You will need to decide for yourself which is best.

Convert The Array To A Hash
Depending on your requirements, this can be a very attractive solution (print "It's in there\n" if exists $sauce{spaghetti}). There is a list of things to consider:

  • How many items do you need to check?
  • If you are only checking a single item, this is probably not the best solution.
  • Will the array change between checks?
  • If you need to rebuild the hash each time you check because you are not sure if the array has changed, this is probably not the best solution.
  • Do you need something other than an exact match?
  • Using a hash is a bad idea for non exact matches. The one noteable exception is perhaps case insensitivity. You can force the keys to a certain case during creation and foo to the same case when searching. Even then, it is questionable with regards to this being the best solution.
  • How big is the array?
  • Duplicating data structures for the convenience of searching can get quite expensive with regards to memory consumption. You will need to determine how large a footprint is acceptable to you.
  • Do you need to know if there is more than one match?
  • Remember that hash keys must be unique so using a hash when there is a possibility of duplicates is a bad idea if you need to know how many times foo is present in the array.
  • Do you need to know foo's index?
  • Typically for memory reasons, you would build your hash like:
    my %lookup = map { $_ => undef } @array; print "There it is\n" if exists $lookup{foo};
    But if you have given some meaning to the relationship to the index and the value (or perhaps you want to use splice to remove it), then you should consider using the following instead:
    my %lookup = map { $array[$_] => $_ } 0 .. $#array; my $index = $lookup{foo};

Use The Built-In grep
It is fast and powerful and deserves careful consideration:

  • How many items do you need to check?
  • If you are checking many items, this is probably not the best solution because it goes through the entire list even if the match is found the first time. Keep in mind that many is subjective and since it has been optimized for efficiency, it still may be acceptable.
  • Will the array change between checks?
  • You might want to use grep if the answer is yes or you are unsure since it is not checking against a copy that might be stale.
  • Do you need something other than an exact match?
  • You might want to use grep if the answer is yes or you are unsure. Since grep can accept a block, you can easily do a case insensitive search (grep { $_ =~ /^foo$/i } @array)
  • How big is the array?
  • If the array is very large, you may want to consider something other than grep. Of course very large is subjective. If you are not going to be checking numerous items and you can afford to wait a few extra seconds, grep might be the perfect choice.
  • Do you need to know if there is more than one match?
  • Since grep goes through the entire list, it is an ideal candidate if the answer to this question is yes. In list context, grep will return the list of matching items. In scalar context - it will return the number of matching items.
  • Do you need to know foo's index?
  • Since grep can take a block, it is very versatile and can easily be adapted to this situation:
    my @indices = grep { $array[$_] eq 'foo' } 0 .. $#array;

Use List::Util's first
It works very much like grep except it returns on the first match, which depending on your situation can make it a perfect solution. It is worth noting that if it does not find a match, it returns undef which may be problematic if that's the value you are trying to match.

  • How many items do you need to check?
  • At first glance, you might say it does not matter. One issue is that the first function is not a compiled built-in like grep. A code block is dereferenced and executed for each element in the list until a match is found. A second issue is that first does not currently accept a reference as an argument so a list of aliases has to be generated. Be sure not to worry about premature optimization when the net difference in runtime is a few seconds unless it really matters.
  • Will the array change between checks?
  • Like grep, if the answer is yes or you are unsure, you should be considering first since it is not checking against a copy that might be stale.
  • Do you need something other than an exact match?
  • Like grep, if the answer is yes or you are unsure then first is a possibility since it takes a block.
  • How big is the array?
  • Again, you might say it doesn't matter at first glance since it returns on first match. The first function was built to be flexible accepting a code block that has to be dereferenced against each item in the list until a match is found. If you really need lightening speed then you might want to consider a roll-your-own version that is hard-coded.
  • Do you need to know if there is more than one match?
  • If the answer is yes, do not choose first since it is designed to return on first match.
  • Do you need to know foo's index?
  • The first function can be adapted to handle this situation just like grep.

Use the smart match operator ~~ introduced in perl 5.10.0
If all you need to know is if an item is in the array or not, this may be the fastest solution. The smart match operator can do much more complicated matching, but since 5.10.0 is only a few days old this tutorial does not cover them.

  • Do you need to maintain backwards compatibility?
  • If the answer is yes, the smart match operator is not the best option.
  • Do you need anything more complicated than an exact match?
  • The smart match operator can probably handle it but you will need to Benchmark to determine if it is the best solution. See the latest perlsyn for details.

IYAW (Invent Yet Another Wheel)
There is rarely a good reason to do this. Sanity checks such as profiling and benchmarking should be done before you spending valuable time trying to squeeze a couple of seconds off the runtime. I can only think of a couple of reasons you might want to do this:

  • You can't afford the performance hit of alias construction and repeatedly dereferencing the code block passed to first
  • You need to do some transformation before checking but need to keep the original array intact
  • I freely admit that this falls out of the scope of checking the presence of an item in an array. By copying the array and then transforming it you have changed the problem. This does not stop people from asking and so I have included it here for an attempt at completeness.

Update: I modified the code snippet in the "Is This Tutorial For You" section to remove exists per rob_au's suggestion. It was not necessary to illustrate the point.

Update 2: 2004-10-02 - wolv pointed out to me that the inefficiencies in List::Util's first are in the pure perl version. The XS version is obviously much faster. Benchmark to be sure but it still loses to grep in many situations.

Update 3: 2007-12-22 - Added smart match operator section as a result of the release of perl 5.10.0. See this for a speed comparison.

Comment on Getting Matching Items From An Array
Select or Download Code
Re: Getting Matching Items From An Array
by rob_au (Abbot) on Jul 05, 2004 at 23:58 UTC
    This is an excellent post Limbic~Region++. One note which I would make however is that the exists function can be of use with arrays where testing that an index is within the bounds of the defined array (without autovivifying a non-existent element) - This may be something useful which you can add to this text.

     

    perl -le "print unpack'N', pack'B32', '00000000000000000000001011100111'"

      rob_au,
      Instead of adding it, I have updated the node to remove exists in that section as it was not necessary. It did not occur to me that novices might see exists as improper with an array and total miss using a string as an index.

      While your point is well taken, I believe "Autovivication Pitfalls and How To Avoid Them" deserves to be a tutorial all to itself.

      Cheers - L~R

      exists still autovivifies in some situations:

      use Data::Dumper; my %h = ( foo => 1 ); print "Exists" if exists $h{bar}[2]; print Data::Dumper::Dumper \%h; __END__ $VAR1 = { 'bar' => [], 'foo' => 1 };

      Which probably isn't the right thing to do.

      ----
      send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

        It should only autoviv that array element if the hash key is already defined IMHO...
      I'm not so sure there's a huge benefit here.
      my @x; $x[9] = 1; print "$#x ", scalar(@x), $/; print exists $x[2] ? 1 : 0, $/; print defined $x[2] ? 1 : 0, $/; + $x[2] = undef; print exists $x[2] ? 1 : 0, $/; print defined $x[2] ? 1 : 0, $/; ---- 9 10 0 0 1 0

      Basically, all you're saying is whether that element was assigned to. That's different from "Is this index within the bounds of the array?", which is tested by if ($i <= $#array).

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perltutorial [id://371938]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2014-07-26 20:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (179 votes), past polls