Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

separate word count for each line of a file

by melb100 (Initiate)
on Nov 21, 2012 at 18:11 UTC ( [id://1004978]=perlquestion: print w/replies, xml ) Need Help??

melb100 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl monks,

I am trying to write a script that will allow me to count words for each individual line of a file (the file contains 64 sentences, so I would like 64 different word counts, next to the relevant sentence).

I am brand new to Perl and so far have managed to cobble together the following, which gives me total words in the file, lines in the file, characters in the file, average words per line, average characters per word, and even characters per line(including spaces), but now I have come to you as I cannot work out how I get a word count for each line.

#!/usr/local/bin/perl #MB Nov 2012 #script to check experiment stimuli #counts the number of lines in a file #counts number of words in a file and averages across lines #count the number of characters in a file and averages across words #BUT how do I count the number of words ON EACH LINE? while (<>) { chop; foreach $w (split) { $words++; $char = $char + length($w); #cumulative characters for all words } #$wordline = length($_); # this gives the character count per line ins +tead of the word count; } #print "\n$. the sentence [$_] has $words words"; # this prints the se +ntence(good!) but counts the cumulative number of words for all sente +nces until that point instead of that sentence only print "\n$words words in this file\n"; $avwds = $words/$.; #average words per line $avch = $char/$words; #average characters per word print "there are $char characters in total"; print "\nthere are $. lines with an average of $avwds words per line"; print "\nthere are $words words with an average of $avch characters pe +r word";

I feel as though I should be able to make a list of words in each line, and then I could count the length of that list for each line, but I am not sure how to do this and how to integrate it with the foreach word loop I already have.

Any advice or pointers appreciated,

madeleine

Replies are listed 'Best First'.
Re: separate word count for each line of a file
by Riales (Hermit) on Nov 21, 2012 at 18:23 UTC

    Hint: the following line is where you count the words in every line. The only difference is that you add the count to a total word count (every word in the document) as opposed to just that line. This sounds like homework so I hesitate to just give you the answer...but maybe the hint is enough to go off of.

    foreach $w (split) { $words++; # Increment the word count by one for each word. $char = $char + length($w); #cumulative characters for all words }
Re: separate word count for each line of a file
by 2teez (Vicar) on Nov 22, 2012 at 03:54 UTC

    Hi melb100,

    I think you are really not far away from the solution you "dearly" seek.
    I will rather use a Perl Hash, to get at the answers, while going over each of the line in a given file.
    As shown below, A foot in the door, would give an entrance to the whole solution, I believe.
    Like this:

    use warnings; use strict; use Data::Dumper; my %line; while (<DATA>) { chomp; # not chop # initialize total number of character to get for each line my $line_total_characters_without_space = 0; foreach my $w (split) { $line{'Total_number_of_words'}++; print " In line: ", $., " word: ", $w, " has length: ", length +($w), $/; $line_total_characters_without_space += length($w); } # get the number of characters for each line, space inclusive push @{ $line{$.}{'line_total_characters_with_space'} }, length($_ +); # get the number of characters for each line, without including sp +ace push @{ $line{"line $."} }, $line_total_characters_without_space; } $Data::Dumper::Sortkeys = 1; print Dumper \%line; __DATA__ Mary had a little lamb, whose fleece was white as snow. And everywhere that Mary went, the lamb was sure to go. It followed her to school one day which was against the rules.
    Output
    In line: 1 word: Mary has length: 4 In line: 1 word: had has length: 3 In line: 1 word: a has length: 1 In line: 1 word: little has length: 6 In line: 1 word: lamb, has length: 5 In line: 2 word: whose has length: 5 In line: 2 word: fleece has length: 6 In line: 2 word: was has length: 3 In line: 2 word: white has length: 5 In line: 2 word: as has length: 2 In line: 2 word: snow. has length: 5 In line: 4 word: And has length: 3 In line: 4 word: everywhere has length: 10 In line: 4 word: that has length: 4 In line: 4 word: Mary has length: 4 In line: 4 word: went, has length: 5 In line: 5 word: the has length: 3 In line: 5 word: lamb has length: 4 In line: 5 word: was has length: 3 In line: 5 word: sure has length: 4 In line: 5 word: to has length: 2 In line: 5 word: go. has length: 3 In line: 7 word: It has length: 2 In line: 7 word: followed has length: 8 In line: 7 word: her has length: 3 In line: 7 word: to has length: 2 In line: 7 word: school has length: 6 In line: 7 word: one has length: 3 In line: 7 word: day has length: 3 In line: 8 word: which has length: 5 In line: 8 word: was has length: 3 In line: 8 word: against has length: 7 In line: 8 word: the has length: 3 In line: 8 word: rules. has length: 6 $VAR1 = { '1' => { 'line_total_characters_with_space' => [ 23 ] }, '2' => { 'line_total_characters_with_space' => [ 31 ] }, '3' => { 'line_total_characters_with_space' => [ 0 ] }, '4' => { 'line_total_characters_with_space' => [ 30 ] }, '5' => { 'line_total_characters_with_space' => [ 24 ] }, '6' => { 'line_total_characters_with_space' => [ 0 ] }, '7' => { 'line_total_characters_with_space' => [ 33 ] }, '8' => { 'line_total_characters_with_space' => [ 28 ] }, 'Total_number_of_words' => 34, 'line 1' => [ 19 ], 'line 2' => [ 26 ], 'line 3' => [ 0 ], 'line 4' => [ 26 ], 'line 5' => [ 19 ], 'line 6' => [ 0 ], 'line 7' => [ 27 ], 'line 8' => [ 24 ] };

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      Thank you for your tips monks,

      I am researcher and I needed to check lisst of experimental sentences very quickly (definitely not homework! ;-) ) so in the end I went for the most basic of solutions and am just running

      perl -ne "{print tr/ //, $/}" FILE
      on all my files and scanning down the resulting word counts for outliers. Not very elegant!!

      I am afraid that as a complete novice (this being the first thing I have ever tried to do in perl) I wasn't clever enough to work out number of words on each line from your hints. If there is a more elegant solution I'd be grateful but in the meantime this will do! :)

      Thanks again for trying to help, and I promise to persevere with the "proper" solution when time is less pressing!

      m
Re: separate word count for each line of a file
by aitap (Curate) on Nov 21, 2012 at 19:46 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1004978]
Approved by Paladin
Front-paged by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-23 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found