Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Merging Two CSV files Based on Two Columns

by ig (Vicar)
on Sep 08, 2012 at 03:03 UTC ( #992419=note: print w/replies, xml ) Need Help??

in reply to Merging Two CSV files Based on Two Columns

I might use the Text::CSV module to parse and compose CSV, perhapse something like the following:

use strict; use warnings; use Data::Dumper; use IO::String; use Text::CSV; my $file1 = IO::String->new(<<EOF) or die "$!"; username, date_modified 2010-nhw-es-ea-285, 2012-10-10 aab.external.audit.-.bvr.verif.externes, 2012-02-22 aashee.aziz, 2012-01-27 abbas.rangwala, 2012-01-27 abbie.gardner, 2012-09-10 ab.border-frontiere, 2012-08-11 abdool.yasseen, 2012-01-31 abdullah.aboukarr, 2012-08-10 abdullahi.sheikh, 2012-02-28 EOF my $file2 = IO::String->new(<<EOF) or die "$!"; fullname, branch aaron.northway, CHIEF FINANCIAL OFFICER BRANCH aashee.aziz, HEALTH PRODUCTS AND FOOD BRANCH abbey.klugerman, HEALTHY ENVIRONMENTS AND CONSUMER SAFETY BRANCH abby.hoffman, STRATEGIC POLICY BRANCH abderrahim.boussanni, CHIEF FINANCIAL OFFICER BRANCH abdiaziz.nur, HEALTHY ENVIRONMENTS AND CONSUMER SAFETY BRANCH abdool.yassin, HEALTH PRODUCTS AND FOOD BRANCH abdoulie.manjang, N/A abdullah.hassen, HEALTH PRODUCTS AND FOOD BRANCH abdullah.patel, REGIONS AND PROGRAMS BRANCH asdf "asdf" asdf,, EOF my $csv1 = Text::CSV->new({ binary => 1, allow_whitespace => 1, sep_char => ',', }); my %date_modified_by_username; $csv1->column_names( $csv1->getline($file1) ); # get header line while(my $href = $csv1->getline_hr($file1)) { $date_modified_by_username{$href->{username}} = $href->{date_modif +ied}; } my $csv2 = Text::CSV->new({ binary => 1, allow_whitespace => 1, sep_char => ',', eol => "\n", }); $csv2->print(\*STDOUT, [ qw{username branch date_modified} ]); $csv2->column_names( $csv2->getline($file2) ); # get header line while(my $href = $csv2->getline_hr($file2)) { my $date_modified = $date_modified_by_username{$href->{fullname}} || 'unknown'; $csv2->print(\*STDOUT, [ $href->{fullname}, $href->{branch}, $date_modified ]); }

With your sample data this produces:

username,branch,date_modified aaron.northway,"CHIEF FINANCIAL OFFICER BRANCH",unknown aashee.aziz,"HEALTH PRODUCTS AND FOOD BRANCH",2012-01-27 abbey.klugerman,"HEALTHY ENVIRONMENTS AND CONSUMER SAFETY BRANCH",unkn +own abby.hoffman,"STRATEGIC POLICY BRANCH",unknown abderrahim.boussanni,"CHIEF FINANCIAL OFFICER BRANCH",unknown abdiaziz.nur,"HEALTHY ENVIRONMENTS AND CONSUMER SAFETY BRANCH",unknown abdool.yassin,"HEALTH PRODUCTS AND FOOD BRANCH",unknown abdoulie.manjang,N/A,unknown abdullah.hassen,"HEALTH PRODUCTS AND FOOD BRANCH",unknown abdullah.patel,"REGIONS AND PROGRAMS BRANCH",unknown

Replies are listed 'Best First'.
Re^2: Merging Two CSV files Based on Two Columns
by TheCanadian (Initiate) on Sep 14, 2012 at 14:45 UTC

    Hi: ig

    Thank you for your post. I am testing various ways of getting the job done. Your method does interest me however, due to my lack of knowledge, I have been having difficulty reading though the code with full understanding. I feel I need this to enable me to effectively replace variables and files for the script to work on.


      You don't need to worry about IO::String. That's just a way to get a file handle that reads from a string. You can use a file handle from open to read a real file.

      I you have any particular questions about the code, feel free to ask.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://992419]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2018-06-25 03:08 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.