Re: Adding object identifiers corresponding to matched headers and sub-headers.

If I wanted to produce the following result

Header stuff
123456|987|12
Apples|9
Oranges|19
Bananas|4
Footer junk
Header stuff
123456|987|34
Apples|7
Oranges|15
Bananas|11
Footer junk
Header stuff
123456|987|56
Apples|3
Oranges|9
Bananas|8
Footer junk
[download]

from the two input files fake1.dat

Header stuff
123456|987|12
Apples|4
Oranges|12
Bananas|3
Footer junk
Header stuff
123456|987|34
Apples|5
Oranges|7
Bananas|8
Footer junk
Header stuff
123456|987|56
Apples|2
Oranges|1
Bananas|3
Footer junk
[download]

and fake2.dat

Header stuff
123456|987|12
Apples|5
Oranges|7
Bananas|1
Footer junk
Header stuff
123456|987|34
Apples|2
Oranges|8
Bananas|3
Footer junk
Header stuff
123456|987|56
Apples|1
Oranges|8
Bananas|5
Footer junk
[download]

I would probably write a script like this to do it:

#!/usr/bin/perl -w

use strict;

my %data;

{
    #  Go looking for files that match this pattern.

    foreach my $thisFile (glob("fake?.dat")) {

      #  Open the file, and die if that doesn't work.

      open ( INPUT, $thisFile ) or
        die "Unable to open $thisFile: $!";

      my ( $header, $id, @data, $footer );
      while (<INPUT>) {

        #  Read in a line from the file. We're expecting
        #  a header, an ID line, followed by a bunch of
        #  lines of data, terminated by a footer. There
        #  can be several of these records in a file. For 
        #  the sake of simplicity, we assume that the
        #  lines of data are always present and always in
        #  the same order.

        chomp;
        if ( defined ( $header ) ) {

          if ( defined ( $id ) ) {

            if ( /Footer/ ) {

              #  If we just saw a footer, that's the end
              #  of a record and we can process what we 
              #  have now.

              $footer = $_;

              #  The unique ID number is the last number
              #  on the ID line.

              my ( $id3 ) = $id =~ m/\|(\d+)$/;

              #  Store this record's information into a
              #  hash, either re-using the existing hash
              #  element, or creating a new one.

              if ( exists($data{ $id3 }) ) {

                my @updatedData;
                foreach ( @{$data{ $id3 }->{data}} ) {

                  my @dataSoFar = split(/\|/, $_);
                  my @thisData  = split(/\|/,shift @data);

                  $dataSoFar[1] += $thisData[1];
                  push ( @updatedData, join('|', @dataSoFar) );
                }
                $data{ $id3 }->{data} = \@updatedData;
                
              } else {

                $data{ $id3 }->{header} = $header;
                $data{ $id3 }->{id}     = $id;
                push ( @{$data{ $id3 }->{data}}, @data );
                $data{ $id3 }->{footer} = $footer;
              }
      
              #  Clear variables for next loop around the 
              #  input file.

              undef $header;
              undef $id;
              @data = ();
              undef $footer;

            } else {
            
              push ( @data, $_ );
            }

          } else {

            $id = $_;
          }
        } else {

          $header = $_;
        }

      }
      close ( INPUT );
    }

    #  Having added up the various lines of data, we now 
    #  dump out a summary.

    foreach my $thisKey ( sort keys %data ) {

      print "$data{ $thisKey }->{'header'}\n";
      print "$data{ $thisKey }->{'id'}\n";
      foreach ( @{$data{ $thisKey }->{'data'}} ) {
        print "$_\n";
      }
      print "$data{ $thisKey }->{'footer'}\n";
    }
}
[download]

See if that helps you.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Comment on Re: Adding object identifiers corresponding to matched headers and sub-headers. Select or Download Code


There's more than one way to do things
	PerlMonks