Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Merging Two CSV files Based on Two Columns

by Tux (Monsignor)
on Sep 08, 2012 at 06:59 UTC ( #992441=note: print w/ replies, xml ) Need Help??


in reply to Merging Two CSV files Based on Two Columns

Please consider adding some code tags in your post, so your data looks more like data.

Note that your first example uses DBD::CSV, which uses Text::CSV_XS underneath to parse the CSV data. Example 2 uses <FINAL> plus invalid split. Examples 3 uses the same reading mechanism as example 2, but with a split that is still unsafe and example 4 uses two endless loops.

If you want to do the "work" yourself and not rely on DBD::CSV, switch to using one of the two proved CSV parsers: Text::CSV (as ig already suggested) or Text::CSV_XS (the fast one).

In what you describe, getline_hr might be very helpful, maybe even getline_hr_all.

use text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh, "<", "final2.csv" or die "final2.csv: $!"; $csv->column_names ($csv->getline ($fh)); # Read the header line while (my $r = $csv->getline_hr ($fh)) { $File1Map{$r->{username}} = $r->{date_modified}; } : : $csv->eol ("\n"); # Now use it for output # inside the for/while loop $csv->print ($fh, [ $usename, $oldvalue, $newvalue ]);

Enjoy, Have FUN! H.Merijn


Comment on Re: Merging Two CSV files Based on Two Columns
Select or Download Code
Re^2: Merging Two CSV files Based on Two Columns
by TheCanadian (Initiate) on Sep 14, 2012 at 14:32 UTC

    Hi: Tux

    Your post got a lot of my attention as it is short with the promise that Text::CSV_XS is the fast one. I tried implementing your code that you presented, however, there is a part that is missing regarding the output. I tried filling in the blanks but my code still did not work correctly.

    Can Text::CSV be substituted by Text_CSV_XS?

    TheCanadian

      Text::CSV_XS is written (mostly) in XS code: that is the C API to perl and can thus be optimized at a very low level. Text::CSV is a wrapper over Text::CSV_XS and Text::CSV_PP, the Pure-Perl implementation that does exactly the same, but up to a 100 time slower. The documentation should be identical. The wrapper will automatically choose Text::CSV_XS when it is installed and available.

      If you are now using Text::CSV and you install Text::CSV_XS the code doesn't have to change at all but will immediately gain speed.

      Where are the "blanks"? I think that by now the documentation is quite elaborate, and comes with clear examples.


      Enjoy, Have FUN! H.Merijn

        Hi: Tux

        Thank you for your support. First I want to reiterate what I want from this script. I have two CSV files one that is 544 rows long and has two columns, "username" and "date_modified". The other CSV file has two columns, "fullname" and "branch" but is about 12489 rows long. All the "usernames" from the first CSV file are almost identical to the "fullname" from the second CSV file. So I want a new CSV file that is about 544 rows long but has the following columns; "username", "branch", "date_modified".

        Sample File #1 (final2.csv)

        username date_modified 2010-nhw-es-ea-285 2012-10-10 aab.external.audit.-.bvr.verif.externes 2012-02-22 aashee.aziz 2012-01-27 abbas.rangwala 2012-01-27 abbie.gardner 2012-09-10 ab.border-frontiere 2012-08-11 abdool.yasseen 2012-01-31 abdullah.aboukarr 2012-08-10 abdullahi.sheikh 2012-02-28

        Sample File #2 (HCDir2.csv)

        fullname branch aaron.northway CHIEF FINANCIAL OFFICER BRANC +H aashee.aziz HEALTH PRODUCTS AND FOOD BRAN +CH abbey.klugerman HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abby.hoffman STRATEGIC POLICY BRANCH abderrahim.boussanni CHIEF FINANCIAL OFFICER BRANC +H abdiaziz.nur HEALTHY ENVIRONMENTS AND CONS +UMER SAFETY BRANCH abdool.yassin HEALTH PRODUCTS AND FOOD BRAN +CH abdoulie.manjang N/A abdullah.hassen HEALTH PRODUCTS AND FOOD BRAN +CH abdullah.patel REGIONS AND PROGRAMS BRANCH

        From the two sample data I should get a CSV file with only 2 rows and 3 columns.

        username branch date_modified aashee.aziz HEALTH PRODUCTS AND FOOD BRANCH 2012-01-27 abdool.yasseen HEALTH PRODUCTS AND FOOD BRANCH 2012-01-31

        To tell you the truth I am having a lot of difficulties with the variables $_ and ARGV, I can't seem to read through the script and know what values are kept in these variables. However, I chose your script Tux, using Text::CSV_XS. Yet getting the script to work is another thing. Below is the script I put together from your sample script. It unfortunately is riddled with errors. Would you be so kind and patient as to give me a helping hand with this script.

        #!F:\strawberry-perl-5.16.1.1\perl\bin\perl -w use strict; use warnings; #use diagnostics; use Text::CSV_XS my %File1Map; my %File2Map; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whit +espace => 1 }); open my $fh1, "<", "final.csv" or die "final.csv: $!"; $csv->column_names ($csv->getline ($fh1)); # Read the header line while (my $row1 = $csv->getline_hr ($fh1)){ my $File1Map{$row1->{username}} = $row1->{date_modified}; } open my $fh2, "<", "HCDir.csv" or die "HCDir.csv: $!"; $csv->column_names ($csv->getline ($fh2)); # Read the header line while (my $row2 = $csv->getline_hr ($fh2)){ my $File2Map{$row2->{fullname}} = $row2->{branch}; } $csv->eol ("\n"); # Now use it for output open my $fh3, ">", "completed.csv" or die "completed.csv: $!"; my $oldvalue my $newvalue while ($fh1){ $oldvalue = $File1Map{"$username"} = "$date_modified"; $newvalue = $File2Map{"$fullname"} = "$branch"; $csv->print ($fh3, [ $username, $newvalue, $oldvalue ]); }

        Thank you in advance.

        TheCanadian 20120918-1533 :-}

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://992441]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (16)
As of 2014-12-22 17:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (124 votes), past polls