Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Merging Two CSV files Based on Two Columns

by TheCanadian (Initiate)
on Sep 14, 2012 at 14:32 UTC ( #993733=note: print w/ replies, xml ) Need Help??


in reply to Re: Merging Two CSV files Based on Two Columns
in thread Merging Two CSV files Based on Two Columns

Hi: Tux

Your post got a lot of my attention as it is short with the promise that Text::CSV_XS is the fast one. I tried implementing your code that you presented, however, there is a part that is missing regarding the output. I tried filling in the blanks but my code still did not work correctly.

Can Text::CSV be substituted by Text_CSV_XS?

TheCanadian


Comment on Re^2: Merging Two CSV files Based on Two Columns
Re^3: Merging Two CSV files Based on Two Columns
by Tux (Monsignor) on Sep 15, 2012 at 08:09 UTC

    Text::CSV_XS is written (mostly) in XS code: that is the C API to perl and can thus be optimized at a very low level. Text::CSV is a wrapper over Text::CSV_XS and Text::CSV_PP, the Pure-Perl implementation that does exactly the same, but up to a 100 time slower. The documentation should be identical. The wrapper will automatically choose Text::CSV_XS when it is installed and available.

    If you are now using Text::CSV and you install Text::CSV_XS the code doesn't have to change at all but will immediately gain speed.

    Where are the "blanks"? I think that by now the documentation is quite elaborate, and comes with clear examples.


    Enjoy, Have FUN! H.Merijn

      Hi: Tux

      Thank you for your support. First I want to reiterate what I want from this script. I have two CSV files one that is 544 rows long and has two columns, "username" and "date_modified". The other CSV file has two columns, "fullname" and "branch" but is about 12489 rows long. All the "usernames" from the first CSV file are almost identical to the "fullname" from the second CSV file. So I want a new CSV file that is about 544 rows long but has the following columns; "username", "branch", "date_modified".

      Sample File #1 (final2.csv)

      username date_modified 2010-nhw-es-ea-285 2012-10-10 aab.external.audit.-.bvr.verif.externes 2012-02-22 aashee.aziz 2012-01-27 abbas.rangwala 2012-01-27 abbie.gardner 2012-09-10 ab.border-frontiere 2012-08-11 abdool.yasseen 2012-01-31 abdullah.aboukarr 2012-08-10 abdullahi.sheikh 2012-02-28

      Sample File #2 (HCDir2.csv)

      fullname branch aaron.northway CHIEF FINANCIAL OFFICER BRANC +H aashee.aziz HEALTH PRODUCTS AND FOOD BRAN +CH abbey.klugerman HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abby.hoffman STRATEGIC POLICY BRANCH abderrahim.boussanni CHIEF FINANCIAL OFFICER BRANC +H abdiaziz.nur HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abdool.yassin HEALTH PRODUCTS AND FOOD BRAN +CH abdoulie.manjang N/A abdullah.hassen HEALTH PRODUCTS AND FOOD BRAN +CH abdullah.patel REGIONS AND PROGRAMS BRANCH

      From the two sample data I should get a CSV file with only 2 rows and 3 columns.

      username branch date_modified aashee.aziz HEALTH PRODUCTS AND FOOD BRANCH 2012-01-27 abdool.yasseen HEALTH PRODUCTS AND FOOD BRANCH 2012-01-31

      To tell you the truth I am having a lot of difficulties with the variables $_ and ARGV, I can't seem to read through the script and know what values are kept in these variables. However, I chose your script Tux, using Text::CSV_XS. Yet getting the script to work is another thing. Below is the script I put together from your sample script. It unfortunately is riddled with errors. Would you be so kind and patient as to give me a helping hand with this script.

      #!F:\strawberry-perl-5.16.1.1\perl\bin\perl -w use strict; use warnings; #use diagnostics; use Text::CSV_XS my %File1Map; my %File2Map; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row1 = $csv->getline_hr ($fh1)){ my $File1Map{$row1->{username}} = $row1->{date_modified}; } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row2 = $csv->getline_hr ($fh2)){ my $File2Map{$row2->{fullname}} = $row2->{branch}; } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; my $oldvalue my $newvalue while ($fh1){ $oldvalue = $File1Map{"$username"} = "$date_modified"; $newvalue = $File2Map{"$fullname"} = "$branch"; $csv->print ($fh3, [ $username, $newvalue, $oldvalue ]); }

      Thank you in advance.

      TheCanadian 20120918-1533 :-}

        That last loop obviously is wrong. You cannot loop over a file handle. Certainly not over one that has been read to the end. Compare to below example and work out what you want exactly yourself

        open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row = $csv->getline_hr ($fh1)) { $File1Map{$row->{username}} = $row->{date_modified}; # NO "my" the +re! } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row = $csv->getline_hr ($fh2)){ $File2Map{$row->{fullname}} = $row->{branch}; # NO "my" there! } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; $csv->print ($fh3, [qw( user date_modified branch )]); foreach my $user (sort keys %File1Map) { $csv->print ($fh3, [ $user, $File1Map{$user}, $File2Map{$user} || +"?" ]); } close $fh3;

        Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://993733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2014-08-22 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (155 votes), past polls