Hi: I am new to PERL and basically am self taught through web articles and examples. My question is how do I merge two comma separated CSV files based
on two key columns being equal. I have tried 4 methods I borrowed from the web. These examples are below but first I will give samples of the two CSV
files. My preferred method is #1 for it's simplicity.
Sample File #1 (final2.csv):
username, date_modified
2010-nhw-es-ea-285, 2012-10-10
aab.external.audit.-.bvr.verif.externes, 2012-02-22
aashee.aziz, 2012-01-27
abbas.rangwala, 2012-01-27
abbie.gardner, 2012-09-10
ab.border-frontiere, 2012-08-11
abdool.yasseen, 2012-01-31
abdullah.aboukarr, 2012-08-10
abdullahi.sheikh, 2012-02-28
(There is 544 lines)
Sample File #2 (HCDir2.csv):
fullname, branch
aaron.northway, CHIEF FINANCIAL OFFICER BRAN
+CH
aashee.aziz, HEALTH PRODUCTS AND FOOD BRA
+NCH
abbey.klugerman, HEALTHY ENVIRONMENTS AND CON
+SUMER SAFETY BRANCH
abby.hoffman, STRATEGIC POLICY BRANCH
abderrahim.boussanni, CHIEF FINANCIAL OFFICER BRAN
+CH
abdiaziz.nur, HEALTHY ENVIRONMENTS AND CON
+SUMER SAFETY BRANCH
abdool.yassin, HEALTH PRODUCTS AND FOOD BRA
+NCH
abdoulie.manjang, N/A
abdullah.hassen, HEALTH PRODUCTS AND FOOD BRA
+NCH
abdullah.patel, REGIONS AND PROGRAMS BRANCH
(There is 12489 lines)
Intended output file (For sample data):
username, branch, date_modified
aashee.aziz, HEALTH PRODUCTS AND FOOD BRANCH, 2012-01-27
abdool.yasseen, HEALTH PRODUCTS AND FOOD BRANCH, 2012-01-31
In reality all records in final.csv should match with a name in HCDir.csv but not all HCDir.csv data will be found in final.csv
Method #1
#!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE
+RL on Windows XP.
use strict;
use warnings;
use DBI;
my $datafolder = "F:/Perl_program";
my $dbh = DBI->connect(
"dbi:CSV:f_dir=$datafolder;f_ext=.csv;csv_eol=\n;"
);
$dbh->{'csv_tables'}->{"HCDir2.csv"} = {'col_names' => ["fullname"
+, "branch"]};
$dbh->{'csv_tables'}->{"final2.csv"} = {'col_names' => ["username"
+, "date_modified"]};
$dbh->do("
CREATE TABLE completed.csv
AS SELECT final.username,HCDir.branch,final.date_modified
FROM HCDir
INNER JOIN final
ON final.username = HCDir.fullname
");
Method #2
#!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE
+RL on Windows XP.
use strict;
use warnings;
use DBI;
open FINAL, "<final2.csv" or die();
open HCDIR, "<HCDir2.csv" or die();
open OUTFILE, ">completed.csv" or die();
our %hash1;
our %hash2;
while (<FINAL>) {
our ($username, $date_modified) = split;
$hash1{$username} = $date_modified;
}
while (<HCDIR>) {
our ($fullname, $branch) = split;
$hash2{$fullname} = $branch;
}
if (our $username = our $fullname){
print OUTFILE "$username $hash2{$fullname} $hash1{$username}\n";
}
close FINAL;
close HCDIR;
close OUTFILE;
Method #3
#!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE
+RL on Windows XP.
use strict;
use warnings;
open (HCDIR, "<HCDir2.csv") or die "$!";
open (FINAL, "<final2.csv") or die "$!";
open(NBMD, ">nbmd.csv") or die "$!";
our $line1;
our $line2;
our @lines1 = (<FINAL>);
our @lines2 = (<HCDIR>);
foreach $line2(@lines2){
our @linearr2 = split (',', $line2);
our $fullname = $linearr2[0];
our $branch = $linearr2[1];
}
foreach $line1(@line1){
our @linearr1 = split (',', $line1);
our $username = $linearr1[0];
our $date_modified = $linearr1[1];
}
while (<FINAL>){
if ($username eq $fullname){
our $nbmd = join (',', "$username,$branch$,date_modified")
+;
chomp $nbmd;
print NBMD ("$nbmd\n");
}
}
close (FINAL);
close (HCDIR);
close (NBMD);
Method #4
#!F:\Perl_program\perl\bin\perl -w #I am using Portable Strawberry PE
+RL on Windows XP.
use strict;
use warnings;
our $file1 = "F:/Perl_program/final2.csv";
our $file2 = "F:/Perl_program/HCDir2.csv";
our $fileOut = "F:/Perl_program/completed.csv";
our %file1Map = ();
our %file2Map = ();
open (FILE1, "<final.csv") or die("Could not open $file1 file.");
foreach our $line1 () {
(our $username, our $date_modified) = split(',',$line1,2);
our $File1Map{"$username"} = "$date_modified";
}
close(FILE1);
open (INPUT2, "<modhcdir2") or die("Could not open $file2 file.");
foreach $line2()
{
(our $fullname,our $branch) = split(',',$line2);
our $File2Map{"$fullname"}="$branch";
}
close(FILE2);
open (OUTFILE, ">completed.csv") or die("Could not open $fileout file.
+");
foreach $fileout(){
(our $username2,our $dumour)=split('\n',$fileout);
our $existingValue = $File1Map{"$username"};
our $newValue = $File2Map{"$fullname"};
if (! defined $existingValue){
$existingValue = ",";
}
if (! defined $newValue){
$newValue = "\n";
}
print OUTFILE "" . $username2 . "," . $existingValue . "," . $newValue
+;
}
close (OUTFILE);
I intend to create a new CSV file (completed.csv) containing two columns from final.csv and one from HCDir.csv (final.username,HCDir.branch,final.date_modified). The
match or key is column "username" from final.csv and "fullname" from HCDir.csv.
The issues from the methods displayed above is as follows.
Method #1: No results are produced just an empty file.
Method #2: Produces error "Use of uninitialized value in concatenation (.) or string".
Method #3: Produces "Variable "@line1" is not imported" and Global symbol "@line1" requires explicit package name errors.
Method #4: Produces "syntax error at ..." and "Global symbol "$fileout" requires explicit ..." errors.
Would someone be so kind and give me a little assistance to solving this simple issue for something quite important to me.
Thank you in advance.
JGW 20120907-1720 :-}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.