Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Reading in Block

by eversuhoshin (Sexton)
on Feb 24, 2011 at 05:14 UTC ( #889910=perlquestion: print w/ replies, xml ) Need Help??
eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:


Hello,

I am new to Perl and I have been trying to figure this out like crazy. I have a transcripts that are divided by ---------for each executive speaking. I want to count how many words each executive speaks. For instance,


--------------------------------------------

Joe Moglia, Ameritrade - CEO 3

--------------------------------------------
Thank you, Donna. Good morning, everybody and welcome to our last conference call for Ameritrade as a stand-alone entity and for our very, very first conference call for TD Ameritrade.
--------------------------------------------

Randy MacDonald, Ameritrade - CFO 13

--------------------------------------------
There's one other element that makes it easier which is that Waterhouse outsources their clearing to ADP and we've had -- this is our eighth integration and we've gone through that with ADP a number of times now. So we have a pretty well worn cookbook on how to do that. ------------------------------------------------------------

So I know that each executive speaking is divided by ------------ but I don't know how to tell perl for each executive count the number of words. I was thinking of the flip flop matching but I am not sure. I guess the code would go

while<text>{ if executive name == executive $count=number line if the next line is ------- and number line = count+1 if !! \-+ !!//..!!/\-+!! count number of words next \-+ | \-+
Help would be greatly appreciated. Thank you for your time and consideration. Sincerely, Pureum Kim

Comment on Reading in Block
Download Code
Re: Reading in Block
by ikegami (Pope) on Feb 24, 2011 at 05:57 UTC

    You could just read the file into a var and use a regex match.

    A custom line terminator would also do the trick.

    local $/ = "\n--------------------------------------------\n" <>; for (;;) { my $speaker = <>; last if !defined($speaker); chomp($speaker); my $speech = <>; chomp($speech); ... }
Re: Reading in Block
by GrandFather (Cardinal) on Feb 24, 2011 at 06:02 UTC

    As a quick and nasty script something like this perhaps?

    use strict; use warnings; my %names; my $name; local $/ = '--------------------------------------------'; while (<DATA>) { chomp; s/\s+/ /g; next if ! /\S/; if (! defined $name) { $name = $_; next; } my @words = split; $names{$name} += @words; $name = undef; } print "$_: $names{$_}\n" for sort keys %names; __DATA__ -------------------------------------------- Joe Moglia, Ameritrade - CEO 3 -------------------------------------------- Thank you, Donna. Good morning, everybody and welcome to our last conf +erence call for Ameritrade as a stand-alone entity and for our very, very fir +st conference call for TD Ameritrade. -------------------------------------------- Randy MacDonald, Ameritrade - CFO 13 -------------------------------------------- There's one other element that makes it easier which is that Waterhous +e outsources their clearing to ADP and we've had -- this is our eighth i +ntegration and we've gone through that with ADP a number of times now. So we have + a pretty well worn cookbook on how to do that. ------------------------------------------------------------

    Prints:

    Joe Moglia, Ameritrade - CEO 3 : 30 Randy MacDonald, Ameritrade - CFO 13 : 51

    You might want to tidy that up a little before you show it to your boss or teacher though! Altogether too much use of the default variable.

    True laziness is hard work
Re: Reading in Block
by elef (Friar) on Feb 24, 2011 at 11:15 UTC
    Before doing what the fellow monks have suggested, I'd unify the --------'s with something like this, to be on the safe side: s/{5,}/---------------------------------/;
    You don't want ---- to be left in your text because there were more dashes in one separator, or segments failing to be separated because there weren't enough.
Re: Reading in Block
by Anonymous Monk on Feb 24, 2011 at 12:01 UTC
    csplit/wc The word counts are off by one since --- is included in the files.
Re: Reading in Block
by SimonClinch (Chaplain) on Feb 24, 2011 at 13:49 UTC
    I would tend to use flip-flopping indeed for this kind of thing:
    my $speaking = 1; my $speaker, %words; for (<>) { chomp; s/^\s+//; s/\s+$//; if ( /\-\-\-/ ) { $speaking = !$speaking; } elsif( $speaking ) { $words{ $speaker } += scalar split( /\s+/ ); } else { $_ and $speaker = $_; } }

    One world, one people

      if ( /\-\-\-/ ) {
      can be coded as (to reduce back-whackin')...
      if ( /\Q---/ ) {
      or even as ...
      if ( /---/ ) {
      since dash is not a special character in this regex context.

      Forgive me for being picky, but I don't see a flip-flop operator in this example. The flip-flop consists of an if which has two tests joined by a range. See Range Operators in perldoc perlop.

      if ( <some cond or re> .. <another test or re> ) { ... }

      As Occam said: Entia non sunt multiplicanda praeter necessitatem.

        I never said there was a Perl flip-flop operator. OK the OP did say flip-flop matching which implies yet a third thing to be just as picky. But I presented $x = !$x in my code suggestion as the simplest implementation in Perl of a flip-flop I can think of, irrespective of what it's used for.

        One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://889910]
Approved by ig
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-09-21 06:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls