http://www.perlmonks.org?node_id=1054374

zee3b has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I'm working on a text parser. I grab the data from the file and push it into an array and then I foreach it, but I'm only looking for specific successive lines when I'm calculating the average. Here's an example. My data file is of this structure
Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10
So basically as soon as my regex matches the name Jill or Jack. I want it to pick the score from the 3rd line, any command to add into the if loop? e.g
if ($line ~= /Jill/) { *pick the score from the sucessive second line*
This way I can average the scores for each student for different tests.

output = Jack Average 39

Jill Average 50

Replies are listed 'Best First'.
Re: Selecting successive lines
by kcott (Archbishop) on Sep 17, 2013 at 04:05 UTC

    G'day zee3b,

    You can read that data in paragraph mode (see perlvar for details): basically, this allows you to read each group of three lines as a single record. When you do this, you can grab the name, from line 1, and the score, from the end of line 3, in a single operaton.

    #!/usr/bin/env perl -l use strict; use warnings; my %score; { local $/ = ""; while (<DATA>) { /\A(\w+).*?(\d+)\D*\z/ms; ++$score{$1}{count}; $score{$1}{total} += $2; } } for (sort keys %score) { print $_, ' Average ', $score{$_}{total} / $score{$_}{count}; } __DATA__ Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10

    Output:

    $ pm_file_parse_avg.pl Jack Average 39 Jill Average 50

    For your real application, you'll probably also want either int for whole number averages, or sprintf to format floating point results.

    -- Ken

      You can read that data in paragraph mode (see perlvar for details)

      OMG I have been doing this the hard way (with manual parse phase state and similar techniques) for over a decade.

      :: facepalm ::

      Off to read now. And let the blush drain from my face.

      Thanks for the reference!

Re: Selecting successive lines
by frozenwithjoy (Priest) on Sep 17, 2013 at 03:52 UTC
    If I were going to do this and wanted to have it expandable, I'd probably plan to make a hash like this (you can simplify it if you don't care about keeping track of student IDs and never have students with the same names):
    my %records = ( 12445 => { name => 'Jack', scores => [ 45, 10 ], }, 234254 => { name => 'Jill', scores => [ 45, 10 ], }, );

    If the data are consistently in the format shown, the following will make the hash (it can accept scores w/ decimals). It then calculates and reports the mean score.

    #!/usr/bin/env perl use strict; use warnings; use feature 'say'; use List::Util 'sum'; my %records; while ( my $name = <DATA> ) { my ($id) = <DATA> =~ /(\d+)$/; my ($score) = <DATA> =~ /(\d+\.?\d*)$/; <DATA>; chomp( $name, $id, $score ); $records{$id}{name} = $name; push @{ $records{$id}{scores} }, $score; } for ( sort { $a <=> $b } keys %records ) { my $name = $records{$_}{name}; my @scores = @{ $records{$_}{scores} }; my $mean = sum(@scores) / @scores; say "$name ($_) has an average score of $mean"; } __DATA__ Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10

    OUTPUT:

    Jack (12445) has an average score of 39 Jill (234254) has an average score of 50
Re: Selecting successive lines
by davido (Cardinal) on Sep 17, 2013 at 04:35 UTC

    I would read each record complete rather than one line at a time, and to keep it simple, would push scores into an anonymous array per student. Then just iterate over each student and do the math.

    use List::Util qw( sum ); my %student; local $/ = ''; while( <DATA> ) { push @{$student{$1}}, $2 and next if m/\A([^\n]+).+?(\d+)\n?\Z/s; warn "Invalid record #$.: <<$_>>\n"; } while( my( $name, $scores ) = each %student ) { print "$name: Average ", sum( @{$scores} ) / @$scores, "\n"; } __DATA__ Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10

    Dave

Re: Selecting successive lines
by ansh batra (Friar) on Sep 17, 2013 at 09:01 UTC

    I have a simple way
    keep a flag,when you get the name set the flag as one and if flag is one then get score.

    foreach(traverse data) { if($data =~ /jack/) { //put some variable here for name $flag=1; } if($flag ==1 && $data=~ /Math Score2 - /) { $score=$'; $flag=0; } }

    Since you need to check for multiple names , you may also use
    if ( grep( /^$data$/, @array ) )

Re: Selecting successive lines
by sundialsvc4 (Abbot) on Sep 17, 2013 at 12:50 UTC

    The customary approach to doing this sort of thing is the approach taken by the awk utility:  

    /regular_expression /
      { code to execute if regex is matched }
    ... rinse and repeat ...

    So, in this file, there would be .. it looks like .. about five different “kinds” of lines, including blank-line, and you have things-to-do with two of them.   For a student_name line, you capture the name and proceed.   For a Score(n) line, you extract the score and do something with it, using the most-recently captured student name.   Perhaps for a blak-line you forget the name.   And so on.

    One advantage of this approach is that it is relatively “future-proof.”   You are making fewer assumptions about the data, such as “the third line.”   It is also now much easier for your program to recognize when there is a bug in the program that produced the file, which is another important consideration in production settings.