http://www.perlmonks.org?node_id=1037544

flash4syth has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Long time Perl student, first time Perl Monks post.

My program reads in a text file, and I want to count white-space words which are between double quotes. The quote marks are already delimited by space. For example:

'Here is " a quoted string " for you'

The quotes often extend beyond one line, so as I read in the file, I split each line and append the result onto an array of words, here's an example of the resulting array content:

( '"', 'quoted', 'words', '"', )

How can I count the words between the quotes for every instance of open/close double quotes in this array?

Thanks in advance

Replies are listed 'Best First'.
Re: Count Quoted Words
by Cristoforo (Curate) on Jun 07, 2013 at 02:10 UTC
    Here is an example, partly from the docs for Text::ParseWords, (part of core since perl 5). But, it is not using an array, but the whole body of text.
    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; my $text; do {local $/; $text = <DATA>}; my @words = quotewords('\s+', 1,$text); my $i = 0; foreach (@words) { if (/^".+"$/s) { printf "<%s> : COUNT %d\n", $_, scalar split; } } __DATA__ "Yes, yes, how was it now?" he thought, going over his dream. "Now, how was it? To be sure! Alabin was giving a dinner at Darmstadt; no, not Darmstadt, but something American. Yes, but then, Darmstadt was in America. Yes, Alabin was giving a dinner on glass tables, and the tables sang, _Il mio tesoro_--not _Il mio tesoro_ though, but something better, and there were some sort of little decanters on the table, and they were women, too," he remembered.
    The out put is:
    C:\Old_Data\perlp>perl t33.pl <"Yes, yes, how was it now?"> : COUNT 6 <"Now, how was it? To be sure! Alabin was giving a dinner at Darmstadt; no, not Darmstadt, but something American. Yes, but then, Darmstadt was in America. Yes, Alabin was giving a dinner on glass tables, and the tables sang, _Il mio tesoro_--not _Il mio tesoro_ though, but something better, and there were some sort of little decanters on the table, and they were women, too,"> : COUNT 66
Re: Count Quoted Words
by smls (Friar) on Jun 07, 2013 at 03:51 UTC
    Given an array @words like shown in the question, you could do:
    my $count; foreach (@words) { my $quote = ($_ eq '"'); if ($quote ... $quote) { if (!$quote) { $count++ } elsif ($count) { print "$count quoted words\n"; $count = 0 } } }
    Although I admit that the ... operator is slighty obscure... :)
Re: Count Quoted Words
by smls (Friar) on Jun 07, 2013 at 04:14 UTC
    Unless the words array is also needed for something else, my preferred solution would be to skip it entirely and just use a regex on the whole file content (what can I say, I like regexes):
    use File::Slurp; my $text = read_file('input.txt'); while ($text =~ /" (.*?) "/sg) { print "Found quoted string with ".split(' ', $1)." words: $1\n"; }
    The regex might need to be adjusted depending on the exact definition of what should be counted as a quoted string within the input data.

      Thanks! I ended up going with this solution as it is easy to read and allows me to get other data about the entire text using regex's.

Re: Count Quoted Words
by jaredor (Priest) on Jun 07, 2013 at 06:33 UTC

    If you just want the count of all white space words within double quotes in a text file, you don't need to keep much information hanging around. Try filtering:

    perl -E 'say 0+map{split}grep{$i++%2}split/"/,do{undef$/;<>};' fil.txt

    (Please pardon the golfing, I have this thing about keeping one-liners on one line ;-)

Re: Count Quoted Words
by Anonymous Monk on Jun 07, 2013 at 02:31 UTC

    Mwahahahaha

    #!/usr/bin/perl -- #~ #~ #~ #~ # perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if " -otr -op +r -ce -nibc -i=4 -pt=0 "-nsak=*" #!/usr/bin/perl -- use strict; use warnings; use autodie; # error checking for open/close... Main( @ARGV ); exit( 0 ); sub Main { my( @files ) = @_; if( not @files ) { my $lines = q{ And "then" something "happened" on "the first and second and third lines" and then it was over}; @files = ( \'"the boat"', \$lines, \"$lines $lines $lines" ); } for my $file ( @files ) { print "## { $file }{ wordcount= ", Vote( $file ), " }\n"; } } ## end sub Main sub Vote { my( $fyle ) = @_; open my( $wyld ), '<:raw', $fyle; my $in_quotes = 0; my $words = 0; while( my $line = readline $wyld ) { pos( $line ) = 0; WORDCOUNTER: while( length( $line ) > pos( $line ) ) { $line =~ m{ \G\s*\x22 # quote after optional whitespace }gxcs and do { $in_quotes = !$in_quotes; ## flip it next WORDCOUNTER; }; $in_quotes and $line =~ m{ [^\x22\s]+ ## not quote or whitespace }gxcs and do { $words++; next WORDCOUNTER; }; $line =~ m{ \G[^\x22]+ ## not quote }gxcs and do { next WORDCOUNTER; }; } ## end of WORDCOUNTER } ## end of readline return $words; } ## end sub Vote __END__ ## { SCALAR(0xbb713c) }{ wordcount= 2 } ## { SCALAR(0xad05a4) }{ wordcount= 9 } ## { SCALAR(0x3f8fec) }{ wordcount= 27 }