Re: Merge two files with similar column entries

in reply to Merge two files with similar column entries

I'll attempt to give some big hints..
I would suggest a HashOfArray (HoA) for the data structure. Each name is a hash key that points to an array of "monthly data"
If a user name doesn't appear in the %Spreadsheet hash table, then create a new "blank" entry with 12 zeroes for the months - I just used 4 months to demo the technique.

Your file format is space separated. I didn't use the most efficient technique, but it is a tool for your toolbox and it is straight-forward.

I used Perl "here-docs" to represent the 2 files. That changes the code a bit.
have fun and good luck!

#!/usr/bin/perl -w
use strict;
use Data::Dump qw(pp);

my %Month2Index = (January => 0, Febuary =>1, March =>2);
 
my $january =<<END;
 #filename January
 A. Paul                       300004
 Jason                        600000
 Mayur Pandey             40000
 Kelly H                       459000
Ryan M                       349000
END

my $march =<<END;
#filename March
Senthl V R                  600000
Mayur Pandey             40000
Kelly H                       459000
Pratap S                     349000
A. Paul                       300004
END

my %spreadsheet;

# real code would have a file name which
# states the month - I embedded the file name
# into the heredoc variable for this example
# a Perl variable can be opened just like a file
# for reading (or even perhaps writing)
#
foreach my $fileref (\$january, \$march)
{
   open my $file, '<', $fileref or die "$!";
   
   my $comment = <$file>; #first line - throw away in real thing
   my $month_name = (split ' ',$comment)[-1];
   my $month_index = $Month2Index{$month_name};
   
   process_monthly_file ($file, $month_index);   
}

sub process_monthly_file
{
   my ($file, $month_index) = @_;
   
   while (<$file>)
   {
      s/^\s*//;  #remove leading spaces
      
      # this is a space separated format, but we
      # want the last column and all of the columns
      # before that should be "squished into one column"
      # one way is to reverse the line, limit the
      # split and then reverse again.
      
      my $reversed = reverse $_;
      my ($data, $name) = split (' ', $reversed,2);
      
      # note: the split does an implicit "chomp"
      $data = reverse $data;
      $name = reverse $name;
      
      # if this name not seen before, create a new
      # hash entry with a blank array, here just 4 
      # columns (jan,feb,mar,april)
      
      $spreadsheet{$name} ||= [0,0,0,0];
      #or perhaps...
      $spreadsheet{$name} ||= [qw(NA NA NA NA)];
      
      #now enter the data into correct column
      #
      @{$spreadsheet{$name}}[$month_index] = $data;      
   }
}
print pp \%spreadsheet;

__END__
prints:
{
  "A. Paul"      => [300004, 0, 300004, 0],
  Jason          => [600000, 0, 0, 0],
  "Kelly H"      => [459000, 0, 459000, 0],
  "Mayur Pandey" => [40000, 0, 40000, 0],
  "Pratap S"     => [0, 0, 349000, 0],
  "Ryan M"       => [349000, 0, 0, 0],
  "Senthl V R"   => [0, 0, 600000, 0],
}
[download]

In Section Seekers of Perl Wisdom