Rewriting all of whats mentioned above in what I think might help a person totally new to Perl:
use strict ;
use warnings ;
my %count ;
while( my $line = <DATA> ) { # Read lines from DATA, you can
# replace this with a file handle ( F
+H ).
# First break down a single line into words -
# We assume that words are white space separated.
# To include others such as '-' you woould replace
# /\s/ with /[\s-]/
my @words_in_this_line = split( /\s/, $line ) ;
# Now we flip through the words within a single line.
foreach my $word ( @words_in_this_line ) {
# Lowercase it to ensure that repeats in different
# cases are not recounted.
$word = lc( $word ) ;
# Check if there is a number contained in this word,
# we move to the 'next' iteration if there is
# Notice that the condition is after the statement
# that is executed if the condition is True.
next if( $word =~ /\d+/ ) ;
if( defined( $count{ $word } ) ) {
# If I have seen the word before then increment my count.
$count{ $word } ++ ;
} else {
# What if I have never seen this word - Then I need to set cou
+nt as 1;
$count{ $word } = 1 ;
}
} # End of loopint through words.
} # End of looping through lines in file.
# Your - sort { $count{$b} <=> $count{$a} || $a cmp $b} keys %count
# Lets break it up:
# We stored it so the key is the word and the value the count
# This ordering was intentional so as to ensure that we can 'quickly
+'
# figure out if we have seen a word before.
my @uniq_words_in_file = keys %count ;
# We use the brilliant sort function that allows you to tell it what t
+he
# comparison should be.
@uniq_words_in_file =
sort(
{ $count{$b} <=> $count{$a} || $a cmp $b }
@uniq_words_in_file ) ;
# This one bit brings out the beauty of Perl:
# We are passing a Subroutine to the subroutine 'sort'
# 'sort' will use this sub to compare elements during the sort.
# notice that <=> will return -1, 0 or 1 and when
# $count{ $b } is equal to $count{ $a }, '<=>' will return 0.
#
# Now every line in evaluates to a value and Perl uses Lazy evaluation
+.
# What this means is that as it evaluates a boolean 'OR' it will
# stop evaluating expressions after it finds a true value
# ( because True OR anything is always True )
#
# We use this to additionally compare $a and $b as strings this time
# when the counts are equal.
# And now the printing.
foreach my $word ( @uniq_words_in_file ) {
print "'$word'\tOccurred\t$count{ $word }\ttimes\n";
}
__DATA__
This these that the and how who writ this code
1 how now brown cow 1asdf 23
the fox jumped into 123 the hencoop
the lazy brown 2134 dog was azleep.
And now the code again with no comments:
use strict ;
use warnings ;
my %count ;
while( my $line = <DATA> ) {
my @words_in_this_line = split( /\s/, $line ) ;
foreach my $word ( @words_in_this_line ) {
$word = lc( $word ) ;
next if( $word =~ /\d+/ ) ;
if( defined( $count{ $word } ) ) {
$count{ $word } ++ ;
} else {
$count{ $word } = 1 ;
}
} # End of loopint through words.
} # End of looping through lines in file.
my @uniq_words_in_file = keys %count ;
@uniq_words_in_file =
sort(
{ $count{$b} <=> $count{$a} || $a cmp $b }
@uniq_words_in_file ) ;
foreach my $word ( @uniq_words_in_file ) {
print "'$word'\tOccurred\t$count{ $word }\ttimes\n";
}
__DATA__
This these that the and how who writ this code
1 how now brown cow 1asdf 23
the fox jumped into 123 the hencoop
the lazy brown 2134 dog was azleep.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|