As the file is in several distinct sections each with it's own format, first break the file into those sections. As it's also a fairly small file, slurp it and split it on the section separator:
## Slurp the file and break into sections
my @sections = split '-{103}', do{ local $/; <> };
close *ARGV;
Once you have the sections separated, you can treat each one differently.
Rather than having to count all the spaces and manually construct the unpack formats for dealing with them--which is a PIA and they also might change--notice that each section of stats is preceeded by a header line, and that the left hand edge of the column headers forms a left edge limit for the data in the columns.
Also notice that although some of the column titles are multiple words, every title is preceeded by at least two spaces, whilst the multi-word titles themselves contain only single spaces.
That information allows you to use the header lines to construct the unpack formats programmically. The following subroutine takes a header line, uses it to discover the column boundaries and then uses those to construct a format:
sub buildFmt {
my $templ = shift;
my @cols;
push @cols, $-[ 0 ] while $templ =~ m[(?<=\s\s)(?=\S)]gc;
my $p = 0;
my $fmt = '';
$fmt .= 'A' . ( $_ - $p ) . ' ' and $p = $_ for @cols;
return $fmt;
}
This can be reused for all the columnised (sub)sections in the file.
By way of example, this is how to use it break down the four subsections of the 'PLAYER GAME STATISTICS':
## Section 2
## Break the section into lines
my @section2 = split "\n", $sections[ 1 ];
## discard header lines;
shift @section2 for 1 .. 3;
## Away
## Construct the format from the header
my $fmt = buildFmt( shift @section2 );
## Use it to parse the Away player game stats
my @awayStats;
push @awayStats, [ unpack $fmt, shift @section2 ] while $section2[ 0 ]
+ =~ m[\S];
print "@$_" for @awayStats;
## Discard blank lines
shift @section2 while $section2[ 0 ] !~ m[\S];
## Away totals ... same two steps again
$fmt = buildFmt( shift @section2 );
my @awayTotals = unpack $fmt, shift @section2;
print "@awayTotals";
## Discard blank lines
shift @section2 while $section2[ 0 ] !~ m[\S];
## home ... and again
$fmt = buildFmt( shift @section2 ); ## They could vary.
my @homeStats;
push @homeStats, [ unpack $fmt, shift @section2 ] while $section2[ 0 ]
+ =~ m[\S];
print "@$_" for @homeStats;
## Discard blank lines
shift @section2 while $section2[ 0 ] !~ m[\S];
## Home totals ... and again
$fmt = buildFmt( shift @section2 );
my @homeTotals = unpack $fmt, shift @section2;
print "@homeTotals";
Parsing the other sections (with columnised data), is just a repeat of the above.
The code all together as far as I've taken it:
The output from section 2 debugging lines: c:\test>588245 boxscore_340251.txt
4 | Drevitch, Scott | | 1 | +1 | 4 | | | | | | | | | | |
+| |
8 | Mann, Chris | | 1 | +1 | 2 | 4 | 2 | | | | | | | | | |
+ |
9 | Anderson, Erik | 1 | | +1 | 4 | | | | | | | | | | | |
+ |
11 | Lefebvre, Marc | | | -1 | | | | | | | | | | | | |
+ |
14 | Scott, Mark | | 1 | 0 | 5 | | | | | | | | | | | |
+|
17 | Wray, Scott | | | 0 | 10 | 4 | 2 | | | | | | | | | |
+ |
18 | Lazarev, Yevgeny | 1 | | 0 | 1 | | | | | | | | | | |
+ | |
20 | Miller, Derek | | | -2 | 1 | | | | | | | | | | | |
+ |
21 | Fitzpatrick, Chans | | 1 | 0 | | 2 | 1 | | | | | | | |
+ | | |
22 | Cullaton, Brent | | 1 | +1 | 4 | | | | | | | | | | |
+ | |
23 | Kotsopoulos, Tommy | | 1 | -1 | 4 | 5 | | 1 | | | | | |
+| | | |
24 | Morelli, David | 1 | | 0 | 2 | | | | | | | | | | |
+| |
27 | Littlejohn, Frank | 1 | 1 | +1 | 3 | 6 | 3 | | | | | | 1 |
+ | | | |
28 | Lyke, R.C. | | | +1 | | | | | | | | | | | | | |
29 | Gajda, Tyson | | | 0 | | | | | | | | | | | | | |
30 | Tebbs, Kris | | | 0 | | | | | | | | | | | | | |
32 | Tidball, Curtis | | | 0 | | | | | | | | | | | | |
+ |
44 | Lupul, Dale | | | -1 | 5 | | | | | | | | | | | | |
TOTALS | 5 | 7 | +1 | 46 | 23 | 9 | 1 | | | | | 1 | | | 1 |
+1 |
2 | Lupandin, Andrei | | 3 | +1 | 1 | | | | | | | | | | |
+ | |
4 | Currie, Brent | | | -1 | | 2 | 1 | | | | | | | | | |
+ |
5 | Yoder, Jami | | | -1 | | 2 | 1 | | | | | | | | | | |
7 | Radoslovich, Matt | 1 | | -1 | 2 | | | | | | | | | | |
+ | |
8 | Pilkington, Brett | | 1 | +1 | 4 | | | | | | | | | | |
+ | |
9 | Granbois, Travis | | | 0 | | | | | | | | | | | | |
+ |
12 | Nadeau, Patrick | | | -2 | 1 | | | | | | | | | | |
+| |
13 | Parsons, Don | 2 | 1 | +2 | 4 | | | | | | | | 1 | | |
+| |
14 | Starke, Sean | | | 0 | 3 | | | | | | | | | | | |
+|
17 | Stewart, Blake | | 1 | -2 | 1 | | | | | | | | | | |
+| |
18 | Chwedoruk, Justin | | | 0 | 3 | | | | | | | | | | |
+ | |
19 | Woollard, Chad | 1 | 1 | +2 | 3 | 2 | 1 | | | | | | | |
+| | |
20 | Durdin, Sergei | | | -2 | 2 | | | | | | | | | | | |
+ |
24 | Harloff, Nick | | | 0 | 3 | 4 | 2 | | | | | | | | |
+| |
25 | Stauffacher, Luke | | | 0 | 4 | 7 | 1 | 1 | | | | | | |
+ | | |
27 | Wathier, Mathieu | | | +1 | 3 | | | | | | | | | | |
+ | |
41 | Sikich, Zach | | 1 | 0 | | | | | | | | | | | | |
+|
83 | Tapp, Jason | | | 0 | | | | | | | | | | | | | |
TOTALS | 4 | 8 | -2 | 34 | 17 | 6 | 1 | | | | | 1 | | | |
+|
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|