Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Help with accepting inputs, wordcount, and print to a file

by underoathed (Initiate)
on Sep 10, 2012 at 17:03 UTC ( #992807=perlquestion: print w/ replies, xml ) Need Help??
underoathed has asked for the wisdom of the Perl Monks concerning the following question:

In perl, I am supposed to read from a file as an input, count how many times certain words appear in the text file (regardless of punctuation), and output the count to a file. I will list what i have thus far in the code as I am lost when it comes to this. What would you guys recommend me do in this case and keep in mind im a beginner and dont want too much coding in here. (im not asking for a handout here, just asking for pointers in the right direction) Thanks! coding is:
foreach $word (@ARGV) { open (IN, $word) or die "Cannot open file '$word': $!\n"; print "$word: ", $line = <IN>; close (IN); }

Replies are listed 'Best First'.
Re: Help with accepting inputs, wordcount, and print to a file
by toolic (Bishop) on Sep 10, 2012 at 17:14 UTC
Re: Help with accepting inputs, wordcount, and print to a file
by Riales (Hermit) on Sep 10, 2012 at 17:12 UTC

    What do you think that third line is doing right now? It does offer a hint if you run the code though; you should find that it prints the name of the file, then the first line from that file.

    So, knowing that, you might want to use a while loop to loop through each line of the file. For each line, you'll want to deal with each word separately. For that, look into split. There's a few things you could do to keep count of how many times each word appears in the file, but I would recommend using a hash. After you've constructed your hash, iterate through each hash key to find the word with the most occurrences and output that value. Do this for each file.

    Hashes can be pretty tricky at first, so feel free to check in back here when you reach that part and I'm sure either I or another monk will be glad to help you out with that as well. Good luck!

Re: Help with accepting inputs, wordcount, and print to a file
by CountZero (Bishop) on Sep 10, 2012 at 18:18 UTC
    Alternative solution:
    • Count all words in the whole of the file and store the counts in a hash
    • Extract from the hash only those words for which you need to return the count
    No need to use multiple regexes to check for the words. This solution is very fast: checking for words in the whole of "Hamlet" took less than 1 second.

    I have hidden my suggested solution in the "spoiler". First try it yourself and then check my solution.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Help with accepting inputs, wordcount, and print to a file
by polymorpheus (Novice) on Sep 10, 2012 at 17:21 UTC
    Here are some pointers in the right direction with no code ;-)

    1) It is good to see you checking the open() call for errors and properly converting said errors to an exception (calling die()).

    2) You will want to use a lexical scalar variable for file handles and not bare words like IN.

    3) Please use more meaningful variable names ($word -> $file_name) 4) You are going to have to look at each line in turn and use regular expressions to check if the line contains the word you are looking for. For this you need to read the perldoc for "perlre". You need to know if your words can span multipel lines or not, what characters can delimit your words (see the \w \W and \b regexp special characters.).

    Hope this helps - this seems like a real beginner question, so you will probably have a lot more questions once you dig in further to these topics. Please read the docs that come with perl (perldoc tool) as there is a wealth of information there.

Re: Help with accepting inputs, wordcount, and print to a file
by aaron_baugher (Curate) on Sep 10, 2012 at 18:01 UTC

    This may seem like a simple task, but it's fairly complex for a beginner. It involves reading lines from a file, splitting them into words, using a hash to count appearances of certain words, and printing the contents of a hash. That gives you some things to look for in whatever learning materials you're using.

    If I were writing this, the first question I would ask is, "Where do the 'certain words' come from?" If they are in a file, then the first thing you need to do is open that file, read the words from it, and put them into a hash as its keys. That might mean chomping the words, or splitting lines, depending on how the words are saved in that file. Alternatively, if the certain words are being passed on the command line, you can get them from @ARGV and put them in a hash from there.

    Then you can open the file you want to search in, and start reading lines. Now you have a choice: you can split each line into words, and check your hash for each word, incrementing it if it is found. Or you can loop through your hash keys, using a regex to count the number of times each hash key appears in the line, and incrementing the key's value accordingly. Which method is faster will depend on the ratio of 'certain words' to 'all words', but I would try the regex method first.

    After you've read through the whole file, you need to print out the results. Loop through your hash's keys, printing the key and its value (the number of times found). You may also wish to sort on the keys or the values as part of this loop, so check out the documentation for sort as well if that's the case.

    Aaron B.
    Available for small or large Perl jobs; see my home node.

Re: Help with accepting inputs, wordcount, and print to a file
by Kenosis (Priest) on Sep 10, 2012 at 18:32 UTC

    In general, consider placing the following at the top of your scripts:

    use strict; use warnings;

    strict and warnings will prevent many headaches by preemptively catching problem areas in your code, if any.

    Also, if you decide to go the hash route to count words, keep in mind that "word" will be a different key from "Word" in that hash, since keys are case sensitive. To remedy this, you can use lc to first convert all the string's letters to lower-case or uc to convert all the string's letters to to upper-case, before using that string as a hash key.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://992807]
Approved by toolic
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2016-07-26 09:19 GMT
Find Nodes?
    Voting Booth?
    What is your favorite alternate name for a (specific) keyboard key?

    Results (234 votes). Check out past polls.