http://www.perlmonks.org?node_id=992384

TheCanadian has asked for the wisdom of the Perl Monks concerning the following question:

Hi: I am new to PERL and basically am self taught through web articles and examples. My question is how do I merge two comma separated CSV files based on two key columns being equal. I have tried 4 methods I borrowed from the web. These examples are below but first I will give samples of the two CSV files. My preferred method is #1 for it's simplicity.

Sample File #1 (final2.csv):

username, date_modified 2010-nhw-es-ea-285, 2012-10-10 aab.external.audit.-.bvr.verif.externes, 2012-02-22 aashee.aziz, 2012-01-27 abbas.rangwala, 2012-01-27 abbie.gardner, 2012-09-10 ab.border-frontiere, 2012-08-11 abdool.yasseen, 2012-01-31 abdullah.aboukarr, 2012-08-10 abdullahi.sheikh, 2012-02-28 (There is 544 lines)

Sample File #2 (HCDir2.csv):

fullname, branch aaron.northway, CHIEF FINANCIAL OFFICER BRAN +CH aashee.aziz, HEALTH PRODUCTS AND FOOD BRA +NCH abbey.klugerman, HEALTHY ENVIRONMENTS AND CON +SUMER SAFETY BRANCH abby.hoffman, STRATEGIC POLICY BRANCH abderrahim.boussanni, CHIEF FINANCIAL OFFICER BRAN +CH abdiaziz.nur, HEALTHY ENVIRONMENTS AND CON +SUMER SAFETY BRANCH abdool.yassin, HEALTH PRODUCTS AND FOOD BRA +NCH abdoulie.manjang, N/A abdullah.hassen, HEALTH PRODUCTS AND FOOD BRA +NCH abdullah.patel, REGIONS AND PROGRAMS BRANCH (There is 12489 lines)

Intended output file (For sample data):

username, branch, date_modified aashee.aziz, HEALTH PRODUCTS AND FOOD BRANCH, 2012-01-27 abdool.yasseen, HEALTH PRODUCTS AND FOOD BRANCH, 2012-01-31
In reality all records in final.csv should match with a name in HCDir.csv but not all HCDir.csv data will be found in final.csv

Method #1 #!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE +RL on Windows XP. use strict; use warnings; use DBI; my $datafolder = "F:/Perl_program"; my $dbh = DBI->connect( "dbi:CSV:f_dir=$datafolder;f_ext=.csv;csv_eol=\n;" ); $dbh->{'csv_tables'}->{"HCDir2.csv"} = {'col_names' => ["fullname" +, "branch"]}; $dbh->{'csv_tables'}->{"final2.csv"} = {'col_names' => ["username" +, "date_modified"]}; $dbh->do(" CREATE TABLE completed.csv AS SELECT final.username,HCDir.branch,final.date_modified FROM HCDir INNER JOIN final ON final.username = HCDir.fullname ");
Method #2 #!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE +RL on Windows XP. use strict; use warnings; use DBI; open FINAL, "<final2.csv" or die(); open HCDIR, "<HCDir2.csv" or die(); open OUTFILE, ">completed.csv" or die(); our %hash1; our %hash2; while (<FINAL>) { our ($username, $date_modified) = split; $hash1{$username} = $date_modified; } while (<HCDIR>) { our ($fullname, $branch) = split; $hash2{$fullname} = $branch; } if (our $username = our $fullname){ print OUTFILE "$username $hash2{$fullname} $hash1{$username}\n"; } close FINAL; close HCDIR; close OUTFILE;
Method #3 #!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE +RL on Windows XP. use strict; use warnings; open (HCDIR, "<HCDir2.csv") or die "$!"; open (FINAL, "<final2.csv") or die "$!"; open(NBMD, ">nbmd.csv") or die "$!"; our $line1; our $line2; our @lines1 = (<FINAL>); our @lines2 = (<HCDIR>); foreach $line2(@lines2){ our @linearr2 = split (',', $line2); our $fullname = $linearr2[0]; our $branch = $linearr2[1]; } foreach $line1(@line1){ our @linearr1 = split (',', $line1); our $username = $linearr1[0]; our $date_modified = $linearr1[1]; } while (<FINAL>){ if ($username eq $fullname){ our $nbmd = join (',', "$username,$branch$,date_modified") +; chomp $nbmd; print NBMD ("$nbmd\n"); } } close (FINAL); close (HCDIR); close (NBMD);
Method #4 #!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE +RL on Windows XP. use strict; use warnings; our $file1 = "F:/Perl_program/final2.csv"; our $file2 = "F:/Perl_program/HCDir2.csv"; our $fileOut = "F:/Perl_program/completed.csv"; our %file1Map = (); our %file2Map = (); open (FILE1, "<final.csv") or die("Could not open $file1 file."); foreach our $line1 () { (our $username, our $date_modified) = split(',',$line1,2); our $File1Map{"$username"} = "$date_modified"; } close(FILE1); open (INPUT2, "<modhcdir2") or die("Could not open $file2 file."); foreach $line2() { (our $fullname,our $branch) = split(',',$line2); our $File2Map{"$fullname"}="$branch"; } close(FILE2); open (OUTFILE, ">completed.csv") or die("Could not open $fileout file. +"); foreach $fileout(){ (our $username2,our $dumour)=split('\n',$fileout); our $existingValue = $File1Map{"$username"}; our $newValue = $File2Map{"$fullname"}; if (! defined $existingValue){ $existingValue = ","; } if (! defined $newValue){ $newValue = "\n"; } print OUTFILE "" . $username2 . "," . $existingValue . "," . $newValue +; } close (OUTFILE);

I intend to create a new CSV file (completed.csv) containing two columns from final.csv and one from HCDir.csv (final.username,HCDir.branch,final.date_modified). The match or key is column "username" from final.csv and "fullname" from HCDir.csv.

The issues from the methods displayed above is as follows. Method #1: No results are produced just an empty file. Method #2: Produces error "Use of uninitialized value in concatenation (.) or string". Method #3: Produces "Variable "@line1" is not imported" and Global symbol "@line1" requires explicit package name errors. Method #4: Produces "syntax error at ..." and "Global symbol "$fileout" requires explicit ..." errors.

Would someone be so kind and give me a little assistance to solving this simple issue for something quite important to me. Thank you in advance. JGW 20120907-1720 :-}