Selecting successive lines

zee3b has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I'm working on a text parser. I grab the data from the file and push it into an array and then I foreach it, but I'm only looking for specific successive lines when I'm calculating the average. Here's an example. My data file is of this structure

Jack
Student ID - 12445
Math Score - 45

Jill
Student ID - 234254
Math Score - 90

Jack
Student ID -12445
Math Score2 - 33

Jill 
Student ID - 234254
Math Score2 - 10
[download]

So basically as soon as my regex matches the name Jill or Jack. I want it to pick the score from the 3rd line, any command to add into the if loop? e.g

if ($line ~= /Jill/) {
*pick the score from the sucessive second line*
[download]

This way I can average the scores for each student for different tests.

output = Jack Average 39

Jill Average 50

Comment on Selecting successive lines Select or Download Code

Replies are listed 'Best First'.
Re: Selecting successive lines by kcott (Archbishop) on Sep 17, 2013 at 04:05 UTC
G'day zee3b, You can read that data in paragraph mode (see perlvar for details): basically, this allows you to read each group of three lines as a single record. When you do this, you can grab the name, from line 1, and the score, from the end of line 3, in a single operaton. `#!/usr/bin/env perl -l use strict; use warnings; my %score; { local $/ = ""; while (<DATA>) { /\A(\w+).?(\d+)\D\z/ms; ++$score{$1}{count}; $score{$1}{total} += $2; } } for (sort keys %score) { print $_, ' Average ', $score{$_}{total} / $score{$_}{count}; } __DATA__ Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10` [download] Output: `$ pm_file_parse_avg.pl Jack Average 39 Jill Average 50` [download] For your real application, you'll probably also want either int for whole number averages, or sprintf to format floating point results. -- Ken	[reply] [d/l] [select]
Re^2: Selecting successive lines by marinersk (Priest) on Sep 17, 2013 at 10:02 UTC
You can read that data in paragraph mode (see perlvar for details) OMG I have been doing this the hard way (with manual parse phase state and similar techniques) for over a decade. :: facepalm :: Off to read now. And let the blush drain from my face. Thanks for the reference!	[reply]
Re: Selecting successive lines by frozenwithjoy (Priest) on Sep 17, 2013 at 03:52 UTC
If I were going to do this and wanted to have it expandable, I'd probably plan to make a hash like this (you can simplify it if you don't care about keeping track of student IDs and never have students with the same names): `my %records = ( 12445 => { name => 'Jack', scores => [ 45, 10 ], }, 234254 => { name => 'Jill', scores => [ 45, 10 ], }, );` [download] If the data are consistently in the format shown, the following will make the hash (it can accept scores w/ decimals). It then calculates and reports the mean score. #!/usr/bin/env perl use strict; use warnings; use feature 'say'; use List::Util 'sum'; my %records; while ( my $name = <DATA> ) { my ($id) = <DATA> =~ /(\d+)$/; my ($score) = <DATA> =~ /(\d+\.?\d*)$/; <DATA>; chomp( $name, $id, $score ); $records{$id}{name} = $name; push @{ $records{$id}{scores} }, $score; } for ( sort { $a <=> $b } keys %records ) { my $name = $records{$_}{name}; my @scores = @{ $records{$_}{scores} }; my $mean = sum(@scores) / @scores; say "$name ($_) has an average score of $mean"; } __DATA__ Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10 [download] OUTPUT: `Jack (12445) has an average score of 39 Jill (234254) has an average score of 50` [download]	[reply] [d/l] [select]
Re: Selecting successive lines by davido (Cardinal) on Sep 17, 2013 at 04:35 UTC
I would read each record complete rather than one line at a time, and to keep it simple, would push scores into an anonymous array per student. Then just iterate over each student and do the math. `use List::Util qw( sum ); my %student; local $/ = ''; while( <DATA> ) { push @{$student{$1}}, $2 and next if m/\A([^\n]+).+?(\d+)\n?\Z/s; warn "Invalid record #$.: <<$_>>\n"; } while( my( $name, $scores ) = each %student ) { print "$name: Average ", sum( @{$scores} ) / @$scores, "\n"; } __DATA__ Jack Student ID - 12445 Math Score - 45 Jill Student ID - 234254 Math Score - 90 Jack Student ID -12445 Math Score2 - 33 Jill Student ID - 234254 Math Score2 - 10` [download] Dave	[reply] [d/l]
Re: Selecting successive lines by ansh batra (Friar) on Sep 17, 2013 at 09:01 UTC
I have a simple way keep a flag,when you get the name set the flag as one and if flag is one then get score. `foreach(traverse data) { if($data =~ /jack/) { //put some variable here for name $flag=1; } if($flag ==1 && $data=~ /Math Score2 - /) { $score=$'; $flag=0; } }` [download] Since you need to check for multiple names , you may also use `if ( grep( /^$data$/, @array ) )`	[reply] [d/l] [select]
Re: Selecting successive lines by sundialsvc4 (Abbot) on Sep 17, 2013 at 12:50 UTC
The customary approach to doing this sort of thing is the approach taken by the `awk` utility: `/regular_expression / { code to execute if regex is matched } ... rinse and repeat ...` So, in this file, there would be .. it looks like .. about five different “kinds” of lines, including blank-line, and you have things-to-do with two of them. For a student_name line, you capture the name and proceed. For a Score(n) line, you extract the score and do something with it, using the most-recently captured student name. Perhaps for a blak-line you forget the name. And so on. One advantage of this approach is that it is relatively “future-proof.” You are making fewer assumptions about the data, such as “the third line.” It is also now much easier for your program to recognize when there is a bug in the program that produced the file, which is another important consideration in production settings.

Back to Seekers of Perl Wisdom