Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Sentence Measurer

by Anonymous Monk
on Apr 11, 2001 at 22:03 UTC ( [id://71778]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have the following PERL script for manipulating a text file and measuring sentence length.

foreach $sentence(@sentences) { #print FILE "$sentence\n"; @words = split(/[^\w'a-zA-Z0-9_'-?]/,$sentence); $Counter =0; foreach $word(@words){ $Counter = $Counter+1; print ("$word\n"); } $sentence_count{($Counter)} = $sentence_count{($Counter)}+1; } while (($sentence_count,$word_count) = each(%sentence_count)) { print ("There are $word_count sentences of $sentence_count words\n +"); } <P>
And for some reason it counts a slightly different number of words! But not like one less or one more, it's not at all consistent!! WHY oh WHY will it not count the numbers of words correctly??? Can anyone help me? Am I doing something REALLY dumb?? Katy M

Replies are listed 'Best First'.
Re: Sentence Measurer
by Beatnik (Parson) on Apr 11, 2001 at 22:17 UTC
    Lingua::EN::Sentence can split up text into nice english sentences, a plain length on those sentences would get you what you requested...

    ... Quidquid perl dictum sit, altum viditur.
Re: Sentence Measurer
by mirod (Canon) on Apr 11, 2001 at 22:18 UTC

    You can simplify your code quite a bit:

    #!/bin/perl -w use strict; my %sentence_count; foreach my $sentence(<DATA>) { $sentence=~ s/^\W+//; # remove leading non-words my $counter = split(/\W+/,$sentence); # split on non-words sequenc +e (\W+) # in scalar context split wi +ll return # the number of elements in +the generated list, # no need to count them ($co +unt= @word in your # example would work too) $sentence_count{($counter)}++; } while ( my($sentence_count,$word_count) = each(%sentence_count)) { print ("There are $word_count sentences of $sentence_count +words\n"); } __DATA__ one one two, two two. two two, two. three three three three three three three three three. three three, three
Re: Sentence Measurer
by larsen (Parson) on Apr 11, 2001 at 23:02 UTC
Re: Sentence Measurer
by twerq (Deacon) on Apr 11, 2001 at 22:20 UTC
    Something like split " ", $sentence should be sufficent for counting words in a scalar. .

    Try using this:
    my %sentence_count; my @sentences = ( "Hello, how are you doing today?", "Where is the bathroom, pablo?", "My feet have the most beautiful odour!", "It's five o'clock" ); foreach (@sentences) { $sentence_count{scalar(split " ",$_)}++; } foreach (keys %sentence_count) { print "$sentence_count{$_} sentences have $_ words\n"; }
Re: Sentence Measurer
by suaveant (Parson) on Apr 11, 2001 at 22:14 UTC
    well, for one you could put a + after your character class, so you don't count a word when you have something like two spaces side by side...

    you have a lot of repetition in your character class... \w is the same as ummm... a-zA-Z0-9_ (pretty sure), but that shouldn't prevent it from working... really, what is wrong with splitting on whitespace, \s+
    That should give you a decent count.

    as an aside, you can do $Counter++ to add one to counter, or even $Counter += 1;

                    - Ant

Re: Sentence Measurer
by c-era (Curate) on Apr 11, 2001 at 22:15 UTC
    It works for me, this is what I used:
    @sentences = ("one sentence","two sentences, but not realy","a tesing +test."); foreach $sentence(@sentences) { #print FILE "$sentence\n"; @words = split(/[^\w'a-zA-Z0-9_'-?]/,$sentence); $Counter =0; foreach $word(@words){ $Counter = $Counter+1; print ("$word\n"); } $sentence_count{($Counter)} = $sentence_count{($Counter)}+1; } while (($sentence_count,$word_count) = each(%sentence_count)) { print ("There are $word_count sentences of $sentence_count words\n +"); }
    Maybe you aren't getting the right thing in @sentences?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://71778]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-07-20 10:44 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.