Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Identical list members are sometimes counted as different when compared to each other.

by di (Acolyte)
on Jan 02, 2010 at 16:38 UTC ( #815319=perlquestion: print w/replies, xml ) Need Help??

di has asked for the wisdom of the Perl Monks concerning the following question:

Deep in the throes of mystical experience, I am come to the monastery to share my wonder. The following script counts the occurrences of distinct words in a list. (The list of thirty words here found has been pared down from a much larger text normally input from a file, but the effect to be described occurs the same on either form of input.) The mystery resides first in the word "adjuster", which is counted from the list as two separate words with a frequency of 2 and 1. Delete only the first word from the list and lo "adjuster" appears as one with a frequency of 3. Delete only the second word from the list and the same occurs. But delete the third or fourth word and again "adjuster" appears as two! And yet again, delete together words three through nine and "adjuster" metapmorphoses back to one being three in number! How much deeper this may go I do not know but present this mystery in confidence of forthcoming enlightenment.

use strict; use warnings; #open IN, "<list.txt"; #open OUT, ">count.txt"; my @words = qw(this and led cain had been indwelt by an adjuster had a +lways been disdainful he and when he had sought divine assistance an +adjuster indwelt and this adjuster gave cain); my @counted =(); while (@words) { my $key = shift @words; my $count = 1; foreach my $y (0..$#words) { if (not $words[$y]) {next} # Avoids absent members resultant +from splice. if ($words[$y] eq $key) { $count++; splice (@words, $y, 1); } } push @counted, "$key\t$count\n"; } print sort @counted;

Pax vobiscum - and thanks!

Update: ikegami's alternative approach provided amazing benefit. A run that was taking over twenty minutes was reduced to under twenty seconds! And, not unimportantly, it produced accurate output! Many thanks.

  • Comment on Identical list members are sometimes counted as different when compared to each other.
  • Download Code

Replies are listed 'Best First'.
Re: Identical list members are sometimes counted as different when compared to each other.
by ikegami (Patriarch) on Jan 02, 2010 at 17:12 UTC

    The problem with your approach is that you shift all the indexes by one with your splice, but your for loop continually increments $y. You tried to address that problem with if (not $words[$y]) {next}, but that was the wrong fix. The inner loop should visit the words in the reverse order (from the end to the front) to avoid that problem.

    Also, your algorithm is very inefficient. It requires the visit of N/2 words (on avergage) for each of the N words in the input list ( O(N2) ). The algorithm I posted earlier is only O(N), so it scales *much* better.

      Thank you, thank you. What a wonderful reply! The alternative approach is instructive but the explanation of its preferability (i.e. the error of my ways :-) is even moreso. I am studying the implications, reading about hash-related functions and so on, but did not want to delay in expressing my gratitude.

Re: Identical list members are sometimes counted as different when compared to each other.
by ikegami (Patriarch) on Jan 02, 2010 at 17:01 UTC
    my %counts; ++$counts{$_} for @words; for my $word (sort keys %counts) { print("$word: $counts{$word}\n"); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://815319]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2022-09-28 13:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (124 votes). Check out past polls.

    Notices?