Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
No such thing as a small change
 
PerlMonks  

Re^3: Merging Two CSV files Based on Two Columns

by Tux (Monsignor)
on Sep 15, 2012 at 08:09 UTC ( #993853=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Merging Two CSV files Based on Two Columns
in thread Merging Two CSV files Based on Two Columns

Text::CSV_XS is written (mostly) in XS code: that is the C API to perl and can thus be optimized at a very low level. Text::CSV is a wrapper over Text::CSV_XS and Text::CSV_PP, the Pure-Perl implementation that does exactly the same, but up to a 100 time slower. The documentation should be identical. The wrapper will automatically choose Text::CSV_XS when it is installed and available.

If you are now using Text::CSV and you install Text::CSV_XS the code doesn't have to change at all but will immediately gain speed.

Where are the "blanks"? I think that by now the documentation is quite elaborate, and comes with clear examples.


Enjoy, Have FUN! H.Merijn


Comment on Re^3: Merging Two CSV files Based on Two Columns
Re^4: Merging Two CSV files Based on Two Columns
by TheCanadian (Initiate) on Sep 18, 2012 at 19:38 UTC

    Hi: Tux

    Thank you for your support. First I want to reiterate what I want from this script. I have two CSV files one that is 544 rows long and has two columns, "username" and "date_modified". The other CSV file has two columns, "fullname" and "branch" but is about 12489 rows long. All the "usernames" from the first CSV file are almost identical to the "fullname" from the second CSV file. So I want a new CSV file that is about 544 rows long but has the following columns; "username", "branch", "date_modified".

    Sample File #1 (final2.csv)

    username date_modified 2010-nhw-es-ea-285 2012-10-10 aab.external.audit.-.bvr.verif.externes 2012-02-22 aashee.aziz 2012-01-27 abbas.rangwala 2012-01-27 abbie.gardner 2012-09-10 ab.border-frontiere 2012-08-11 abdool.yasseen 2012-01-31 abdullah.aboukarr 2012-08-10 abdullahi.sheikh 2012-02-28

    Sample File #2 (HCDir2.csv)

    fullname branch aaron.northway CHIEF FINANCIAL OFFICER BRANC +H aashee.aziz HEALTH PRODUCTS AND FOOD BRAN +CH abbey.klugerman HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abby.hoffman STRATEGIC POLICY BRANCH abderrahim.boussanni CHIEF FINANCIAL OFFICER BRANC +H abdiaziz.nur HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abdool.yassin HEALTH PRODUCTS AND FOOD BRAN +CH abdoulie.manjang N/A abdullah.hassen HEALTH PRODUCTS AND FOOD BRAN +CH abdullah.patel REGIONS AND PROGRAMS BRANCH

    From the two sample data I should get a CSV file with only 2 rows and 3 columns.

    username branch date_modified aashee.aziz HEALTH PRODUCTS AND FOOD BRANCH 2012-01-27 abdool.yasseen HEALTH PRODUCTS AND FOOD BRANCH 2012-01-31

    To tell you the truth I am having a lot of difficulties with the variables $_ and ARGV, I can't seem to read through the script and know what values are kept in these variables. However, I chose your script Tux, using Text::CSV_XS. Yet getting the script to work is another thing. Below is the script I put together from your sample script. It unfortunately is riddled with errors. Would you be so kind and patient as to give me a helping hand with this script.

    #!F:\strawberry-perl-5.16.1.1\perl\bin\perl -w use strict; use warnings; #use diagnostics; use Text::CSV_XS my %File1Map; my %File2Map; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row1 = $csv->getline_hr ($fh1)){ my $File1Map{$row1->{username}} = $row1->{date_modified}; } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row2 = $csv->getline_hr ($fh2)){ my $File2Map{$row2->{fullname}} = $row2->{branch}; } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; my $oldvalue my $newvalue while ($fh1){ $oldvalue = $File1Map{"$username"} = "$date_modified"; $newvalue = $File2Map{"$fullname"} = "$branch"; $csv->print ($fh3, [ $username, $newvalue, $oldvalue ]); }

    Thank you in advance.

    TheCanadian 20120918-1533 :-}

      That last loop obviously is wrong. You cannot loop over a file handle. Certainly not over one that has been read to the end. Compare to below example and work out what you want exactly yourself

      open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row = $csv->getline_hr ($fh1)) { $File1Map{$row->{username}} = $row->{date_modified}; # NO "my" the +re! } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row = $csv->getline_hr ($fh2)){ $File2Map{$row->{fullname}} = $row->{branch}; # NO "my" there! } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; $csv->print ($fh3, [qw( user date_modified branch )]); foreach my $user (sort keys %File1Map) { $csv->print ($fh3, [ $user, $File1Map{$user}, $File2Map{$user} || +"?" ]); } close $fh3;

      Enjoy, Have FUN! H.Merijn

        Hi: Tux

        I finally, with your help got the script to work. I discovered that the common column had to have the same column header. Your line below with the || really helped because it turned out there were 203 usernames from final.csv that were not found in the HCDir.csv; this trapped the error instead of displaying a blank CSV file. Lastly, I needed to turn off "use strict" because it produces the following error.

        Global symbol "%File1Map" requires explicit package name at completed. +pl line 15 . Global symbol "%File1Map" requires explicit package name at completed. +pl line 28 . Global symbol "%File1Map" requires explicit package name at completed. +pl line 29 . Execution of completed.pl aborted due to compilation errors.

        Other than that the script worked great. Below is a copy of my modified script.

        #!F:\strawberry-perl-5.16.1.1\perl\bin\perl -w #use strict; use warnings; use Text::CSV_XS my %File1Map; my %File2Map; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row = $csv->getline_hr ($fh1)){ $File1Map{$row->{username}} = $row->{date_modified}; } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row = $csv->getline_hr ($fh2)){ $File2Map{$row->{username}} = $row->{branch}; } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; $csv->print ($fh3, [qw( username date_modified branch)]); foreach my $username (sort keys %File1Map) { $csv->print ($fh3, [ $username, $File1Map{$username}, $File2Map{$u +sername} || +"Not Found In HCDir!" ]); } close $fh1; close $fh2; close $fh3;

        I want to thank you again, I am really new to PERL and I learn primarily through examples. Without your help I would still be lost. I like PERL and would like to continue using PERL. I also purchased the Fourth Edition of "Programming Perl" by Tom Christiansen, Brian d Foy, and Larry Wall; dated 2012. I gather this is the Bible but it did not help with my specific situation.

        Yours truly,

        TheCanadian 20120919-1207 :-}

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://993853]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2014-04-17 00:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (437 votes), past polls