better way to find list members?

jptxs has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: better way to find list members? by KM (Priest) on Oct 17, 2000 at 00:28 UTC
This is from Perlfaq.com. `@animals = qw(cat dog bat rat gerbil snake pig chicken); foreach (@animals) { $animals{$_}++; } # initialize %animals hash # use a hash if (exists $animals{'pig'}) { print "We have a pig!\n"; } # use grep if (grep { $_ eq 'snake'} @animals) { print "We have a snake!\n"; } # use a for loop for (@animals) { last if $_ eq 'dog'; } print "We have a dog!\n";` [download] Cheers, KM	[reply] [d/l]
RE: Re: better way to find list members? by DrManhattan (Chaplain) on Oct 18, 2000 at 03:35 UTC
`foreach (@animals) { $animals{$_}++; } # initialize %animals hash` [download] Here's a faster way to do that (looks neat too): `@animals{@animals} = (1) x @animals;` [download] It's not exactly the same since it's setting each hash member to 1 instead of ++'ing it, but that's no big deal if you're just checking for an element's existance. -Matt	[reply] [d/l] [select]
RE: RE: Re: better way to find list members? by runrig (Abbot) on Oct 18, 2000 at 03:58 UTC
If all you want is existence, then this: `@animals{@animals}=undef; print "Got a dog!\n" if exists $animals{dog};` [download] is faster and uses less memory.	[reply] [d/l]
RE: RE: Re: better way to find list members? by myocom (Deacon) on Oct 18, 2000 at 03:38 UTC
Are you sure you meant to do `@animals{@animals}`? That code seems fraught with peril... Update: Actually, the more I look at it, the more it's starting to possibly make some sense to me...	[reply] [d/l]
RE: Re: better way to find list members? by runrig (Abbot) on Oct 18, 2000 at 03:56 UTC
`foreach (@animals) { $animals{$_}++; } # initialize %animals hash # could be $animals{$_}++ for @animals;` [download] Regarding: `# use a for loop for (@animals) { last if $_ eq 'dog'; } print "We have a dog!\n";` [download] That'll always print 'We have a dog!', even when we don't.	[reply] [d/l] [select]
Doesn't anyone do their homework anymore? by indigo (Scribe) on Oct 17, 2000 at 05:54 UTC
As easy it is to code in perl, you'd think someone might actually look and see what the answer really is: #!/usr/bin/perl -w use Benchmark; use strict; # Times array searches. Assumes search element is in the # dead center of the array my $x; my @array; for (10, 50, 100, 500, 1000, 5000) { @array = 1 .. $_; $x = $_ / 2; print "Array size is $_\n"; timethese(5000, { 'grep' => \&grep_test, hash => \&hash_test, manual => \&manual_test }); } sub grep_test { return 1 if grep { $x == $_ } @array; } sub hash_test { my %hash; $hash{$_}++ for @array; return 1 if $hash{$x}; } sub manual_test { for (@array) { return 1 if $x == $_ } } Array size is 10 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 0 wallclock secs ( 0.19 usr + 0.00 sys = 0.19 CPU) (warning: too few iterations for a reliable count) hash: 0 wallclock secs ( 0.50 usr + 0.00 sys = 0.50 CPU) manual: 1 wallclock secs ( 0.17 usr + 0.00 sys = 0.17 CPU) (warning: too few iterations for a reliable count) Array size is 50 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 0 wallclock secs ( 0.68 usr + 0.00 sys = 0.68 CPU) hash: 0 wallclock secs ( 1.95 usr + 0.00 sys = 1.95 CPU) manual: 1 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) Array size is 100 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 2 wallclock secs ( 1.36 usr + 0.00 sys = 1.36 CPU) hash: 4 wallclock secs ( 3.83 usr + 0.00 sys = 3.83 CPU) manual: 1 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) Array size is 500 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 7 wallclock secs ( 6.41 usr + 0.00 sys = 6.41 CPU) hash: 20 wallclock secs (19.23 usr + 0.00 sys = 19.23 CPU) manual: 5 wallclock secs ( 4.48 usr + 0.00 sys = 4.48 CPU) Array size is 1000 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 13 wallclock secs (12.82 usr + 0.00 sys = 12.82 CPU) hash: 41 wallclock secs (38.71 usr + 0.00 sys = 38.71 CPU) manual: 9 wallclock secs ( 9.10 usr + 0.01 sys = 9.11 CPU) Array size is 5000 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 68 wallclock secs (64.70 usr + 0.00 sys = 64.70 CPU) hash: 243 wallclock secs (220.51 usr + 0.03 sys = 220.54 CPU) manual: 48 wallclock secs (45.48 usr + 0.00 sys = 45.48 CPU) [download] Based on these results: grep is a little faster for short lists a manual list traversal is faster for larger lists don't use a hash unless you need to search for several items	[reply] [d/l]
RE: Doesn't anyone do their homework anymore? by Caillte (Friar) on Oct 17, 2000 at 14:29 UTC
The hash test seems a little unfair. It is the only one where you construct the data structure before testing it. If you construct the hash before testing the results are somewhat different I used the following, a slight change from your program Assuming that you can use a hash table elsewhere in the program (and why not?) hash tables appear to make more sense when looked at this way. I cannot see any use for dynamically creating a hash table just for one search and, if you are going to search a hash you may as well structure your data into a hash. Of course there are exceptions to this, there are exceptions to everything in the perl world (TIMTOWTDI) anyway, the data: use Benchmark; use strict; # Times array searches. Assumes search element is in the # dead center of the array my $x; my @array; my %hash; for (10, 50, 100, 500, 1000, 5000) { @array = 1 .. $_; $x = $_ / 2; $hash{$_}++ for @array; print "Array size is $_\n"; timethese(5000, { 'grep' => \&grep_test, hash => \&hash_test, manual => \&manual_test }); } sub grep_test { return 1 if grep { $x == $_ } @array; } sub hash_test { return 1 if $hash{$x}; } sub manual_test { for (@array) { return 1 if $x == $_ } } Array size is 10 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 0 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) (warning: too few iterations for a reliable count) hash: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) manual: 0 wallclock secs ( 0.04 usr + 0.00 sys = 0.04 CPU) (warning: too few iterations for a reliable count) Array size is 50 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 0 wallclock secs ( 0.15 usr + 0.00 sys = 0.15 CPU) (warning: too few iterations for a reliable count) hash: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) (warning: too few iterations for a reliable count) manual: 0 wallclock secs ( 0.11 usr + 0.00 sys = 0.11 CPU) (warning: too few iterations for a reliable count) Array size is 100 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 0 wallclock secs ( 0.27 usr + 0.00 sys = 0.27 CPU) (warning: too few iterations for a reliable count) hash: 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) (warning: too few iterations for a reliable count) manual: 1 wallclock secs ( 0.20 usr + 0.00 sys = 0.20 CPU) (warning: too few iterations for a reliable count) Array size is 500 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 1 wallclock secs ( 1.39 usr + 0.00 sys = 1.39 CPU) hash: 0 wallclock secs (-0.00 usr + 0.00 sys = -0.00 CPU) (warning: too few iterations for a reliable count) manual: 1 wallclock secs ( 0.90 usr + 0.00 sys = 0.90 CPU) Array size is 1000 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) hash: 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) (warning: too few iterations for a reliable count) manual: 2 wallclock secs ( 1.77 usr + 0.00 sys = 1.77 CPU) Array size is 5000 Benchmark: timing 5000 iterations of grep, hash, manual... grep: 15 wallclock secs (15.39 usr + 0.00 sys = 15.39 CPU) hash: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) (warning: too few iterations for a reliable count) manual: 10 wallclock secs ( 9.03 usr + 0.05 sys = 9.08 CPU) [download]	[reply] [d/l]
RE (tilly) 2: Doesn't anyone do their homework anymore? by tilly (Archbishop) on Oct 17, 2000 at 17:42 UTC
A note on hash performance. Hashing is a bucket algorithm. The idea is that you look at the key, produce a number, then drop that name/value into a bucket based on that number. To check if a particular key is in the hash all you need to do is look at the key, find that number, and look in the right bucket. Assuming that there are enough buckets (Perl adjusts this dynamically) and the hashing function does a good job of distributing things (Perl has a very carefully designed hashing function) that bucket will have very few things in it and therefore will be very fast to search. In fact Perl does a pretty good job of keeping that down to 0-3 elements per bucket. So no matter how large your dataset (ok, within the bounds of physical memory limits) accessing a hash is basically constant time. It is slower than accessing an array, but not all that much slower. So comparing building a hash to a linear search is actually pretty fair. If you expect to need to find things more than a (pretty small) fixed number of times, the hash should be faster. For just one it is slower. If the number is dynamic then the hash definitely makes sense. (Yeah, I know. I am ignoring all sorts of things in that broad claim. But Perl is largely designed around the assumption that a hash lookup is constant-time and fast. In the real world that works out pretty well.)	[reply]
RE: RE: Doesn't anyone do their homework anymore? by indigo (Scribe) on Oct 17, 2000 at 22:44 UTC
Ugh! Now the hash test is useless! A hash is only a win if you are doing multiple searches. My benchmark provides enough information to suggest you need to do 4 or 5 search before you break even. Your benchmark, in completely ignoring hash creation overhead, erroneously indicates a hash is best in all situations!	[reply]
RE: better way to find list members? by Adam (Vicar) on Oct 17, 2000 at 02:08 UTC
You are basically doing it the only decent way to do it as things currently stand. I believe there is an RFC for Perl6 to include an array equivalent to exists. Unfornutaly there is no way to find out if an item is in a list without doing a O(n) search. (For the non CS/Math folks here, O(n) means that your question will require you to look in the worst case at every member of the array, short-circuiting buys you a little bit of time over grep but you could still end up searching for the last element in the array) As someone pointed out, you are better off using a hash to store your data, but I should point out that this is not always true. First off, don't shove your data into a hash just to check if one item is in it. Why? Because this first requires you to do the O(n) operation of converting an array to a hash PLUS the time of the hash look-up.^* So either start with a hash, or don't bother with a hash. Hashes have their own costs: They don't retain order (Okay, you can impose an order on a hash, but that's another post) and they consume more memory (but memory is cheap) So the long and the short of it is that if you need to know if something is in an array either use grep or write your own short-circuit version. (If your array isn't too big you will be better off using the built in grep or map rather then trying to re-invent the wheel.) And yes, your routine should do the job nicely. Yup, This is my 500th post here at PerlMonks.org! ^*Update: I should point out that the cost of the actual hash lookup is almost negligable. My point was that you were insuring a worst-case scenario if you built the hash for only one lookup. Things get exponentially better from there.	[reply]
RE: RE: better way to find list members? by jptxs (Curate) on Oct 17, 2000 at 03:13 UTC
My instincts (the same ones that told me not to post this at all -- and it ends up on the front page no less! ), told me it would be a waste to shove this in a hash just to sort it. The whole prolem here is I will never know what the length of the array will be, which thing I will be trying to find out is in the list or which list I need to compare it to until run time. This just makes it easier. I guess I'm not the only one who think's there could be a better way to do this though -- as there has been much debate over it. I love when the monks get going on something...even when it is my lame code : ) -- I'm a solipsist, and so is everyone else. (think about it)	[reply]
RE: better way to find list members? by turnstep (Parson) on Oct 17, 2000 at 00:32 UTC
`print "I am here!" if grep {/$bar/} @foo;` [download] Update and disclaimer: Yes, all the points below are correct, this was just a quick one-liner. Don't use it for things that might actually Take Time or Need To Be Efficient. Use one of the methods above. On second thought, why should I be that lazy? Here's a rewrite to specs: `sub IsItThere($$) { my $testarray = shift; my $string = shift; for (@$testarray) { return 1 if $_ eq $string; ## Or... return 1 if /$string/i; etc. etc. } return 0; }` [download]	[reply] [d/l] [select]
RE: RE: better way to find list members? by japhy (Canon) on Oct 17, 2000 at 00:36 UTC
No no no no no no no. The FAQ strongly suggests against this, and for many reasons: grep() does not short circuit -- once it finds a match, it will keep going through the rest of the list the regex you use is compiled EVERY SINGLE TIME -- this can be slow and irritating the regex you use will say ('foo','bar','blat') contains the element 'ar', even though it doesn't you shouldn't even correct the regex (that is, add the `\A` and `\z` anchors, and `\Q` and `\E` escapes), you should use `$_ eq $bar` DON'T USE GREP `$_="goto+F.print+chop;\n=yhpaj";F1:eval`	[reply]
Re: better way to find list members? by swiftone (Curate) on Oct 17, 2000 at 00:30 UTC
Update: This node completely redone because several people posted before I did. See perlfaq4 `sub member { my $possible = $_[0]; my $array_ref = $_[1]; foreach my $member ( @{$array_ref} ) { if ( $member eq $possible ) { return "1"; } } return "0"; }` [download] This is basically a glorified grep statement. But as the FAQ notes, that's inefficient. The trick is that you are looping over every element (until you hit it), EACH time you test. What you should do is contruct a hash of the values ONCE, and then you're good (as long as the list doesn't change).	[reply] [d/l]
RE: better way to find list members? by Russ (Deacon) on Oct 17, 2000 at 03:20 UTC
Weigh the benefits between `grep` and `reinvented-wheel`. Yes, `grep` is not always optimally efficient. However, how much time, effort (and money) will you spend writing a more efficient algorithm to save a few microseconds? `my $Found = 0; for (@Requirements){ if ($_ eq 'KernelDevlopment'){ $Found = 1; last; } } if ($Found != 0){ spend($Time{Extra}); squeeze($CPU{qw/every last cycle/}); } ### vs. ### get_the_job_done() && move_on() unless grep {/KernelDevelopment/} @Req +uirements;` [download] :-) Russ Brainbench 'Most Valuable Professional' for Perl	[reply] [d/l]
Re: better way to find list members? by Fastolfe (Vicar) on Oct 17, 2000 at 00:49 UTC
The various Set modules such as Set::Scalar deal with set operations, including memberships, union, etc. The other proposed solutions are probably perfectly adequate, but if you ever need a bit more power, this module should help.	[reply]
Re: better way to find list members? by japhy (Canon) on Oct 17, 2000 at 00:29 UTC
Perl FAQ 4, "How can I tell if a list or array contains a certain element?" Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't. ... `$_="goto+F.print+chop;\n=yhpaj";F1:eval`	[reply]
RE: Re: better way to find list members? by jptxs (Curate) on Oct 17, 2000 at 00:43 UTC
if not `grep` and a hash is not a good option for other reasons what would you suggest? -- I'm a solipsist, and so is everyone else. (think about it)	[reply] [d/l]
Re: better way to find list members? by jynx (Priest) on Oct 17, 2000 at 04:58 UTC
This may be a little off the wall: If all you're after is the instances of a particular number and you don't horribly care about the RE, you could try this: `sub count_instances { my $list = join ' ', shift(); my $val = shift; return $list =~ /\b$val\b/; }` [download] It's quick and to the point but it does the very same RE slow-down that japhy warned about, but it doesn't use grep. This is still slower probably, but it will return the number of matches found and 0 if the number isn't there. If i got this wrong could someone please point out errors? Update Sidenote: If you just wanted to find out if the value is there, you could just use `index($val, $list)` which will return -1 if not found and the first index if it is found; assuming you use the string representation. jynx PS If you plan on passing an array to a function you should probably pass a reference, which makes the code more like: `sub count_instances { my $list = shift; my $val = shift; $list = join ' ', @{ $list }; return $list =~ /\b$val\b/; }` [download]	[reply] [d/l] [select]
RE: better way to find list members? by DrManhattan (Chaplain) on Oct 17, 2000 at 23:57 UTC
How much do you know about your data set in advance? If your array is sorted, a binary search is the fastest way to find something. The idea behind a binary search is to check the middle element of the array first. If what you're looking for is smaller than the middle element, you can throw away the upper half of the array. If it's larger than the middle element, you can throw away the lower half. Keep checking the middle element and discarding half the array until you find what you're looking for or run out of array. You divide the size of the problem in half at each stage, so if you have N elements in the array, you only have to make Log₂(N) comparisons in the worst case to get what you want. Here's a binary search routine I wrote just for the fun of it. I didn't test it much so I won't swear it's correct. :) # &binsearch($key, $arrayref) # # Searches the array referenced by $arrayref for # the string $key using binary search. The array # must be sorted. sub binsearch { my $key = shift; my $arrayref = shift; # $lower contains the index of the lowest element # of the range of the array we're searching. $upper # is the index of the highest element. my $lower = 0; my $upper = $#$arrayref; my ($index, $result); # Loop until the upper and lower indexes meet in the middle while ($upper >= $lower) { # Divide the range of the array we're searching in half. # Round fractions to the nearest integer. $index = int($lower + (($upper - $lower) / 2) + .5); # Compare the element in the middle of the range to the # key we're searching for. $result = $key cmp $$arrayref[$index]; if ($result < 0) { # If our search key is lower than the element # in the middle of the range, set the index of # the upper bound on the range to 1 less than # that of the middle element and loop again $upper = $index - 1; } elsif ($result == 0) { # If our search key is equal to the middle # element, we found what we're looking for. return 1; } elsif ($result > 0) { # If our search key is higher than the middle # element of the range, raise the lower end of # the range to 1 more than the middle's index. $lower = $index + 1; } } } [download] -Matt	[reply] [d/l]
RE: RE: better way to find list members? by merlyn (Sage) on Oct 18, 2000 at 00:02 UTC
Yeah, you knew it was coming. Check in the CPAN under Search::Binary. {grin{ -- Randal L. Schwartz, Perl hacker	[reply]
(more specific) Re: better way to find list members? by jptxs (Curate) on Oct 17, 2000 at 00:39 UTC
Update: Aside from changing `foreach` to `for` and using `$_` in the loop, it seems i did use the for loop. having made those changes and being resistent to using a hash for a simple list of numbers, is there anything else I'm missing? -- aside from massive amount of skills and the sheer power of having memorized the docs ; ) OK, should always go with my instincts when i feel I shouldn't post. : ) Two things I should have mentioned are that 1. I really wanted this to be tucked away in a sub which returns true or false and could be passed any number of dif. args, as in the example and 2. The arrays are coming from simple lists of numbers, just like the example array - so a hash seemed to be overkill. many good suggestions in these replies - be benchmarking for now. -- I'm a solipsist, and so is everyone else. (think about it)	[reply] [d/l] [select]
RE: (more specific) Re: better way to find list members? by cwest (Friar) on Oct 17, 2000 at 00:49 UTC
If you insist on just wrapping grep in a subroutine... `my $array = [ 1 .. 50 ]; sub mygrep { my( $guess, $array ) = @_; grep { $_ == $guess } @{$array}; } print mygrep( 165, $array ) ? "True\n" : "False\n";` [download] Enjoy! -- Casey I am a superhero.	[reply] [d/l]
Re: better way to find list members? by Anonymous Monk on Oct 17, 2000 at 10:25 UTC
You have mentioned that the array consists of numerical items only, if these are sorted beforehand then you can use a recursive function which checks the middle element of the list and decides if the target is above or below there and repeats the search on that section recursivly until the item might be found. I think I saw this on PM before but I dont feel like searching for the reference. This technique can be potentially faster for large, presorted arrays but otherwise another tecnique will be more efficient.	[reply]
Re: better way to find list members? by princepawn (Parson) on Oct 20, 2000 at 17:39 UTC
Here is an answer to your question, which makes use of that high-utility CPAN module, Quantum::Superpositions: `use Quantum::Superpositions; use strict; my @ary = (1..7); my $testnum = 4; if ($testnum == any(@ary)) { warn "$testnum is a member"; }` [download]	[reply] [d/l]
RE: better way to find list members? by RobertWerner (Initiate) on Oct 17, 2000 at 03:30 UTC
Late in the day, I know, But how about this; Your progy as above with this sub: sub member { my $input = shift; my $list = shift; my @array = (); for (@$list) { $array$_ = 1; } if ($array$input) {return 1} else {return 0}; } Would only work for numerical data so you'd need to test for that but it sounds like that was what you were doing anyway.	[reply]
Re: better way to find list members? (Iam nuts disregard) by lindex (Friar) on Oct 18, 2000 at 06:13 UTC
`use strict; use vars qw(@array $exist); @array = qw(dog cat flea bird ferret fish bear); $exist = sub{my($ptr);foreach $ptr(@{$_[1]}){return(1) if ($ptr eq $_[ +0]);}}; print ($exist->("cat",\@array),"\n"); print ($exist->("pig",\@array),"\n");` [download] lindex `/**************************/ jason@gost.net, wh@ckz.org http://jason.gost.net /***************************/` [download]	[reply] [d/l] [select]
RE: better way to find list members? by RobertWerner (Initiate) on Oct 17, 2000 at 03:28 UTC
Late in the day, I know, But how about this; Your progy as above with this sub: sub member { my $input = shift; my $list = shift; my @array = (); for (@$list) { $array$_ = 1; } if ($array$input) {return 1} else {return 0}; } Would only work for numerical data so you'd need to test for that but it sounds like that was what you were doing anyway.	[reply]
Re: better way to find list members? by ChOas (Curate) on Oct 17, 2000 at 13:48 UTC
Not tested, probably wrong, but still... `if ( (join " ",@test_array)=~/ $input /o) { print "Hey, there's one of those in there!\n\n"; } else { print "Better luck next time: it's not in there.\n\n"; }` [download] Not even 2 cents...	[reply] [d/l]
RE: better way to find list members? by Anonymous Monk on Oct 18, 2000 at 02:14 UTC
#!/usr/bin/perl #blow off this aray #@ary = qw(1 2 3 4 5 6 7); #set up the aray this way; %keray = qw(1 2 3 4 5 6); for(%keray){ if (/2/){ print "$_\n"; } } # sub disappers	[reply]


"be consistent"
	PerlMonks