'Simple' comparing a hash with an array

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: 'Simple' comparing a hash with an array by moritz (Cardinal) on Apr 16, 2008 at 21:31 UTC
The regex match looks a bit weird to me. Since 'a' is one of the "words" you're looking for, each word that contains an 'a' is printed out in the end. A word like "candidate" is printed for 'a', 'an' and 'can'. You should be a bit more specific about what kind of output you expect. Provide a few lines of sample input and what the output should look like, then we can help you a bit more.	[reply]
Re^2: 'Simple' comparing a hash with an array by mr_mischief (Monsignor) on Apr 16, 2008 at 22:11 UTC
I'm replying here to you, moritz, not because I'm disagreeing with you but because I want to add to your advice and I think it flows better here than as another response to the OP. Anon could fix the specific regex issue using anchors, of course. There are other issues, though, like why a regex is used in the first place when `$key eq $i` would have worked just as well, or why one would loop over the keys of a hash and an array containing the important keys of the hash as well in the first place. I think your advice to be more specific about input and output is apt. I think, too, though, that the OP could use some time with each, perlre, keys, and maybe perldata, map, perlrequick, perlretut, and perlcheat.	[reply] [d/l]
Re: 'Simple' comparing a hash with an array by FunkyMonk (Chancellor) on Apr 16, 2008 at 21:36 UTC
Your match is finding portions of a word: `$key =~ /$i/` will match when `$key` is "elephant" and `$i` is "a" because "elephant" contains an "a". To match on whole words only, use `/b` (word boundary - see perlre) before and after your search term: `my %histogram; my @wordstofind = qw( a am an and are as at be been but by can co de do due each ); while (<DATA>) { my @currentline = split; $histogram{$_}++ for @currentline; } foreach my $i (@wordstofind) { while ( my ($key, $value) = each (%histogram) ) { if ($key =~ /\b$i\b/) { print "Found $key, $value times\n"; } } } __DATA__ hello, I am an elephant. I have a long trunk.` [download] Output: `Found a, 1 times Found am, 1 times Found an, 1 times` [download]	[reply] [d/l] [select]
Re: 'Simple' comparing a hash with an array by mr_mischief (Monsignor) on Apr 16, 2008 at 21:45 UTC
Let's start by eliminating unnecessary steps and by not storing more values into memory than necessary. By getting rid of the fluff, you might be able to see the problem and solution more clearly. #!/usr/bin/perl use strict; use warnings; my %histogram = map { $_ => 0 } qw( a am an and are as at be been but by can co de do due each ); # hash of the words to find so we can do an O(1) lookup for them while ( <> ) { chomp; for ( split ) { # split returns a list we can use directly $histogram{ $_ }++ if exists $histogram{ $_ }; # only store counts for words that matter } } foreach my $word ( keys %histogram ) { # keys() will list the keys, and we've already taken care # of making sure we don't have extra words stored. # Now there's no need to do two loops and check an array # against a hash. print "Found $word, $histogram{$word} times.\n"; } [download]	[reply] [d/l]
Re^2: 'Simple' comparing a hash with an array by Anonymous Monk on Apr 17, 2008 at 14:16 UTC
Thank you monks. I've learnt a lot from this thread :-) I had originally used a hash after a perlfaq4 suggestion about comparing arrays, which would avoid some iteration. However, I ended up iterating over the hash anyway as I didn't know any other ways :-) As some have asked, sample input to this problem is: `WE regret that a press of matter prevents our noticing` [download] I want to count frequencies of certain words, as listed in mr_mischief's %histogram. As for output something along the line of the following is what I'm trying to obtain: `Found for, 1 times. Found such, 1 times. Found up, 1 times. Found at, 2 times. Found had, 1 times. Found was, 1 times. Cumulative total of all words found: 50` [download] To obtain this, and because I don't need to be concerned about the case of the input, I have added to mr_mischief's elegant code very slightly: #!/usr/bin/perl use strict; use warnings; my %histogram = map { $_ => 0 } qw( a am an and are as at be been but by can co de do due each ); # hash of the words to find so we can do an O(1) lookup for them while ( <> ) { chomp; for ( split ) { # split returns a list we can use directly tr/A-Z/a-z/; # lowercase all input print "$_\n"; $histogram{ $_ }++ if exists $histogram{ $_ }; # only store counts for words that matter } } my $count=0; foreach my $word ( keys %histogram ) { # keys() will list the keys, and we've already taken care # of making sure we don't have extra words stored. # Now there's no need to do two loops and check an array # against a hash. if ($histogram{$word} >0) { print "Found $word, $histogram{$word} times.\n"; $count = $count + $histogram{$word}; } } print "Cumulative total of all words found: $count\n"; [download] Thanks again	[reply] [d/l] [select]
Re^3: 'Simple' comparing a hash with an array by mr_mischief (Monsignor) on Apr 17, 2008 at 15:28 UTC
If you're at all worried about locale and language issues, or if you're just concerned about doing things the canonical way, you can use `$_ = lc $_;` instead of `tr/A-Z/a-z/;` to get a lowercase version. lc and uc are built in, and they honor the current language and localization settings.	[reply] [d/l] [select]
Re^2: 'Simple' comparing a hash with an array by wade (Pilgrim) on Apr 17, 2008 at 16:23 UTC
++mr_mischief, nice design! -- Wade	[reply]
Re: 'Simple' comparing a hash with an array by TGI (Parson) on Apr 16, 2008 at 21:41 UTC
~~One problem leaps out at me immediately, you aren't chomping your input lines. Sometimes you'll end up with words like "at\n" in your histogram.~~ `while (defined (my $line = <STDIN> )) { chomp $line; my @currentline = split /\s+/, $line; $histogram{$_}++ for @currentline; }` [download] Another basic issue, why do you iterate over your histogram hash? This sort of thing is exactly what hashes are designed to allow you to avoid. Just test for existence, and retrieve the value as needed. I `if( exists $histogram{$word} ) { print "Found $word, $histogram{$word} times\n"; }` [download] Update: ysth is right. The chomp is not needed. `split` seems to be especially hungry. I was expecting it to match like the code below. `use Data::Dumper; __END__ #Result: while ( <STDIN> ) { my @foo = /(\s+)/; print Dumper \@foo; } C:\> test.pl cat food $VAR1 = [ ' ' ];` [download] TGI says moo	[reply] [d/l] [select]
Re^2: 'Simple' comparing a hash with an array by ysth (Canon) on Apr 17, 2008 at 00:27 UTC
One problem leaps out at me immediately, you aren't chomping your input lines. Sometimes you'll end up with words like "at\n" in your histogram. Nope. Try it. \n is whitespace, which is consumed by the split. -- Online Fortune Cookie Search	[reply]


Keep It Simple, Stupid
	PerlMonks