Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Reading in Block

by eversuhoshin (Sexton)
on Feb 24, 2011 at 05:14 UTC ( #889910=perlquestion: print w/ replies, xml ) Need Help??
eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:


Hello,

I am new to Perl and I have been trying to figure this out like crazy. I have a transcripts that are divided by ---------for each executive speaking. I want to count how many words each executive speaks. For instance,


--------------------------------------------

Joe Moglia, Ameritrade - CEO 3

--------------------------------------------
Thank you, Donna. Good morning, everybody and welcome to our last conference call for Ameritrade as a stand-alone entity and for our very, very first conference call for TD Ameritrade.
--------------------------------------------

Randy MacDonald, Ameritrade - CFO 13

--------------------------------------------
There's one other element that makes it easier which is that Waterhouse outsources their clearing to ADP and we've had -- this is our eighth integration and we've gone through that with ADP a number of times now. So we have a pretty well worn cookbook on how to do that. ------------------------------------------------------------

So I know that each executive speaking is divided by ------------ but I don't know how to tell perl for each executive count the number of words. I was thinking of the flip flop matching but I am not sure. I guess the code would go

while<text>{ if executive name == executive $count=number line if the next line is ------- and number line = count+1 if !! \-+ !!//..!!/\-+!! count number of words next \-+ | \-+
Help would be greatly appreciated. Thank you for your time and consideration. Sincerely, Pureum Kim

Comment on Reading in Block
Download Code
Re: Reading in Block
by ikegami (Pope) on Feb 24, 2011 at 05:57 UTC

    You could just read the file into a var and use a regex match.

    A custom line terminator would also do the trick.

    local $/ = "\n--------------------------------------------\n" <>; for (;;) { my $speaker = <>; last if !defined($speaker); chomp($speaker); my $speech = <>; chomp($speech); ... }
Re: Reading in Block
by GrandFather (Cardinal) on Feb 24, 2011 at 06:02 UTC

    As a quick and nasty script something like this perhaps?

    use strict; use warnings; my %names; my $name; local $/ = '--------------------------------------------'; while (<DATA>) { chomp; s/\s+/ /g; next if ! /\S/; if (! defined $name) { $name = $_; next; } my @words = split; $names{$name} += @words; $name = undef; } print "$_: $names{$_}\n" for sort keys %names; __DATA__ -------------------------------------------- Joe Moglia, Ameritrade - CEO 3 -------------------------------------------- Thank you, Donna. Good morning, everybody and welcome to our last conf +erence call for Ameritrade as a stand-alone entity and for our very, very fir +st conference call for TD Ameritrade. -------------------------------------------- Randy MacDonald, Ameritrade - CFO 13 -------------------------------------------- There's one other element that makes it easier which is that Waterhous +e outsources their clearing to ADP and we've had -- this is our eighth i +ntegration and we've gone through that with ADP a number of times now. So we have + a pretty well worn cookbook on how to do that. ------------------------------------------------------------

    Prints:

    Joe Moglia, Ameritrade - CEO 3 : 30 Randy MacDonald, Ameritrade - CFO 13 : 51

    You might want to tidy that up a little before you show it to your boss or teacher though! Altogether too much use of the default variable.

    True laziness is hard work
Re: Reading in Block
by elef (Friar) on Feb 24, 2011 at 11:15 UTC
    Before doing what the fellow monks have suggested, I'd unify the --------'s with something like this, to be on the safe side: s/{5,}/---------------------------------/;
    You don't want ---- to be left in your text because there were more dashes in one separator, or segments failing to be separated because there weren't enough.
Re: Reading in Block
by Anonymous Monk on Feb 24, 2011 at 12:01 UTC
    csplit/wc The word counts are off by one since --- is included in the files.
Re: Reading in Block
by SimonClinch (Chaplain) on Feb 24, 2011 at 13:49 UTC
    I would tend to use flip-flopping indeed for this kind of thing:
    my $speaking = 1; my $speaker, %words; for (<>) { chomp; s/^\s+//; s/\s+$//; if ( /\-\-\-/ ) { $speaking = !$speaking; } elsif( $speaking ) { $words{ $speaker } += scalar split( /\s+/ ); } else { $_ and $speaker = $_; } }

    One world, one people

      if ( /\-\-\-/ ) {
      can be coded as (to reduce back-whackin')...
      if ( /\Q---/ ) {
      or even as ...
      if ( /---/ ) {
      since dash is not a special character in this regex context.

      Forgive me for being picky, but I don't see a flip-flop operator in this example. The flip-flop consists of an if which has two tests joined by a range. See Range Operators in perldoc perlop.

      if ( <some cond or re> .. <another test or re> ) { ... }

      As Occam said: Entia non sunt multiplicanda praeter necessitatem.

        I never said there was a Perl flip-flop operator. OK the OP did say flip-flop matching which implies yet a third thing to be just as picky. But I presented $x = !$x in my code suggestion as the simplest implementation in Perl of a flip-flop I can think of, irrespective of what it's used for.

        One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://889910]
Approved by ig
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2014-10-02 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (49 votes), past polls