BigGer:
You *can* do it with arrays, but the problem fits a hash much better. Let's take a look at a straightforward array implementation:
WORD: while (my $word = pop @words) {
# lower-case it
$word = lc $word;
# search for the entry
for my $index (0 .. $#words) {
if ($counta[$index][0] eq $word) {
# when found, update the count and
# go to the next word
$counta[$index][1]++;
next WORD;
}
}
# If there's no entry, add a new entry
push @counta, [ $word, 1 ];
}
As you can see, we go through the list of words. For each word, we convert it to lower case, then search for the entry containing the word. If we find the entry, we increment it, otherwise we create a new entry.
So why does the problem fit a hash much better?
- Locating the entry is much easier and faster in a hash, because we can locate the entry by name instead of digging through the array.
- The data structure is simpler. In an array-based implementation, you have to use arrays in each slot to hold both the word and the count. In the hash implementation, the word is the key so the value only has to hold the count.
- Finally, for the array version, you need to explicitly create a new entry when the one you're looking for doesn't exist. With a hash, the act of looking up the value creates a new entry for you automatically if it doesn't exist. This is known as autovivification.
The equivalent hash version looks like this:
WORD: while (my $word = pop @words) {
# lower-case it
$word = lc $word;
# search for the entry. When found, update the
# count, if not found create new entry.
$counth{$word}++;
}
The speed advantages of the hash version are significant. Here's a comparison of an array version, hash version and a greparray version. (I was wondering if grep might be a faster way to search the array than a linear search.)
$ perl t.pl
***** Comparing a list of 100 words 10000 times *****
Rate greparray array hash
greparray 997/s -- -55% -94%
array 2210/s 122% -- -86%
hash 16026/s 1507% 625% --
***** Comparing a list of 1000 words 1000 times *****
Rate greparray array hash
greparray 8.80/s -- -79% -100%
array 41.5/s 372% -- -98%
hash 2070/s 23419% 4887% --
As you can see, as the number of words increases, so does the speed advantage of the hash version. (The code for the test is in the readmore tag....)
...roboticus
When your only tool is a hammer, all problems look like your thumb. |