Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^4: Merging Two CSV files Based on Two Columns

by TheCanadian (Initiate)
on Sep 18, 2012 at 19:38 UTC ( #994342=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Merging Two CSV files Based on Two Columns
in thread Merging Two CSV files Based on Two Columns

Hi: Tux

Thank you for your support. First I want to reiterate what I want from this script. I have two CSV files one that is 544 rows long and has two columns, "username" and "date_modified". The other CSV file has two columns, "fullname" and "branch" but is about 12489 rows long. All the "usernames" from the first CSV file are almost identical to the "fullname" from the second CSV file. So I want a new CSV file that is about 544 rows long but has the following columns; "username", "branch", "date_modified".

Sample File #1 (final2.csv)

username date_modified 2010-nhw-es-ea-285 2012-10-10 aab.external.audit.-.bvr.verif.externes 2012-02-22 aashee.aziz 2012-01-27 abbas.rangwala 2012-01-27 abbie.gardner 2012-09-10 ab.border-frontiere 2012-08-11 abdool.yasseen 2012-01-31 abdullah.aboukarr 2012-08-10 abdullahi.sheikh 2012-02-28

Sample File #2 (HCDir2.csv)

fullname branch aaron.northway CHIEF FINANCIAL OFFICER BRANC +H aashee.aziz HEALTH PRODUCTS AND FOOD BRAN +CH abbey.klugerman HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abby.hoffman STRATEGIC POLICY BRANCH abderrahim.boussanni CHIEF FINANCIAL OFFICER BRANC +H abdiaziz.nur HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abdool.yassin HEALTH PRODUCTS AND FOOD BRAN +CH abdoulie.manjang N/A abdullah.hassen HEALTH PRODUCTS AND FOOD BRAN +CH abdullah.patel REGIONS AND PROGRAMS BRANCH

From the two sample data I should get a CSV file with only 2 rows and 3 columns.

username branch date_modified aashee.aziz HEALTH PRODUCTS AND FOOD BRANCH 2012-01-27 abdool.yasseen HEALTH PRODUCTS AND FOOD BRANCH 2012-01-31

To tell you the truth I am having a lot of difficulties with the variables $_ and ARGV, I can't seem to read through the script and know what values are kept in these variables. However, I chose your script Tux, using Text::CSV_XS. Yet getting the script to work is another thing. Below is the script I put together from your sample script. It unfortunately is riddled with errors. Would you be so kind and patient as to give me a helping hand with this script.

#!F:\strawberry-perl-5.16.1.1\perl\bin\perl -w use strict; use warnings; #use diagnostics; use Text::CSV_XS my %File1Map; my %File2Map; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row1 = $csv->getline_hr ($fh1)){ my $File1Map{$row1->{username}} = $row1->{date_modified}; } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row2 = $csv->getline_hr ($fh2)){ my $File2Map{$row2->{fullname}} = $row2->{branch}; } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; my $oldvalue my $newvalue while ($fh1){ $oldvalue = $File1Map{"$username"} = "$date_modified"; $newvalue = $File2Map{"$fullname"} = "$branch"; $csv->print ($fh3, [ $username, $newvalue, $oldvalue ]); }

Thank you in advance.

TheCanadian 20120918-1533 :-}


Comment on Re^4: Merging Two CSV files Based on Two Columns
Select or Download Code
Re^5: Merging Two CSV files Based on Two Columns
by Tux (Monsignor) on Sep 19, 2012 at 07:58 UTC

    That last loop obviously is wrong. You cannot loop over a file handle. Certainly not over one that has been read to the end. Compare to below example and work out what you want exactly yourself

    open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row = $csv->getline_hr ($fh1)) { $File1Map{$row->{username}} = $row->{date_modified}; # NO "my" the +re! } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row = $csv->getline_hr ($fh2)){ $File2Map{$row->{fullname}} = $row->{branch}; # NO "my" there! } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; $csv->print ($fh3, [qw( user date_modified branch )]); foreach my $user (sort keys %File1Map) { $csv->print ($fh3, [ $user, $File1Map{$user}, $File2Map{$user} || +"?" ]); } close $fh3;

    Enjoy, Have FUN! H.Merijn

      Hi: Tux

      I finally, with your help got the script to work. I discovered that the common column had to have the same column header. Your line below with the || really helped because it turned out there were 203 usernames from final.csv that were not found in the HCDir.csv; this trapped the error instead of displaying a blank CSV file. Lastly, I needed to turn off "use strict" because it produces the following error.

      Global symbol "%File1Map" requires explicit package name at completed. +pl line 15 . Global symbol "%File1Map" requires explicit package name at completed. +pl line 28 . Global symbol "%File1Map" requires explicit package name at completed. +pl line 29 . Execution of completed.pl aborted due to compilation errors.

      Other than that the script worked great. Below is a copy of my modified script.

      #!F:\strawberry-perl-5.16.1.1\perl\bin\perl -w #use strict; use warnings; use Text::CSV_XS my %File1Map; my %File2Map; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row = $csv->getline_hr ($fh1)){ $File1Map{$row->{username}} = $row->{date_modified}; } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row = $csv->getline_hr ($fh2)){ $File2Map{$row->{username}} = $row->{branch}; } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; $csv->print ($fh3, [qw( username date_modified branch)]); foreach my $username (sort keys %File1Map) { $csv->print ($fh3, [ $username, $File1Map{$username}, $File2Map{$u +sername} || +"Not Found In HCDir!" ]); } close $fh1; close $fh2; close $fh3;

      I want to thank you again, I am really new to PERL and I learn primarily through examples. Without your help I would still be lost. I like PERL and would like to continue using PERL. I also purchased the Fourth Edition of "Programming Perl" by Tom Christiansen, Brian d Foy, and Larry Wall; dated 2012. I gather this is the Bible but it did not help with my specific situation.

      Yours truly,

      TheCanadian 20120919-1207 :-}

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://994342]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-07-25 23:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls