http://www.perlmonks.org?node_id=979395

slayedbylucifer has asked for the wisdom of the Perl Monks concerning the following question:

I have 2 .csv files as below:

# cat vm.csv vm_name,vm_cluster vm1,fd1 vm2,fd2 vm3,fd3 vm4,fd4

and

# cat vfiler_fd.csv vm_name,vFiler_IP,vFiler_cluster vm3,1.1.1.3,fd2 vm4,1.1.1.4,fd1 vm1,1.1.1.1,fd4 vm2,1.1.1.2,fd3

Goal:

Compare the 1st field of (vm_name) the two files and if they match then write it to a 3rd CSV file with the column headings as mentioned below (Each row of this 3rd CSV should conain all information about a given vm from both the CSVs files. Basically, I am trying to merge two files lokking at the 1st field.)

vm_name,vm_cluster,vFiler_IP,vFiler_cluster vm1,fd1,1.1.1.1,fd4 vm2,fd2,1.1.1.2,fd3 . . ....etc.

Here is my code which not correct But you will get the idea what I am trying to do:

use strict; use Text::CSV; use Class::CSV; my $report_csv = Class::CSV->new ( filename => "final.csv", fields => [qw/VM_Name VM_Cluster +vFiler_IP vFiler_Cluster MisMatch/], ); my $vm_mapping = Text::CSV->new ({ binary => 1}); my $vfiler_mapping = Text::CSV->new ({ binary => 1 }); open my $vm, "<:encoding(utf8)", "vm.csv" ; open my $vfiler, "<:encoding(utf8)", "vfiler_fd.csv"; while (my $vm_row = $vm_mapping->getline ($vm)) { while (my $vfiler_row = $vfiler_mapping->getline ($vfiler) ) { if ( $vm_row->[0] eq $vfiler_row->[0] ) $report_csv->add_line( { VM_Name => $vm_ro +w->[0], VM_Cluster => $vm_ro +w->[1], vFiler_IP => $vfile +r_row->[1], vFiler_Cluster=> $vfile +r_row->[2], MisMatch => "huhuh +huhuhu", } ); next; } next; }

The logic I used:

retrieve the 1st element of the vm.csv and comapre it with the 1st elements of all the rows of the vFiler.csv. if there is match, then write a new line whihc contains repsective elememnts to a new file called new.csv.

Issues:

this is not working.

I think nested while loop is a horrible practice. anyway, it is not working either.

I am not able to find a way to write "Column_names" with Text::CSV hence using Class:CSV and still it is not working.

I am pretty sure there is a better apporach to do this. Could you help. thanks.

Replies are listed 'Best First'.
Re: Compare 2 CSV files and create a new CSV file out of comparision
by moritz (Cardinal) on Jul 02, 2012 at 09:03 UTC

    The proper way to do it is to read the first file, and use a hash to store the contents. Use the vm_name column as key and the whole row as a value.

    Then read the second file line by line, and look up the vmname in the hash you created earlier, and print the joined columns to the third file.

      Hi. could you get me a sample code structure. Sorry, I am asking for too much, but I didn't quite understand "Use the vm_name column as key and the whole row as a value." part. Thanks again for your time.
Re: Compare 2 CSV files and create a new CSV file out of comparision
by zentara (Archbishop) on Jul 02, 2012 at 09:07 UTC
Re: Compare 2 CSV files and create a new CSV file out of comparision
by CountZero (Bishop) on Jul 02, 2012 at 19:51 UTC
    Your two files are actually two tables in a relational database system.

    So use the power of DBI and SQL.

    use Modern::Perl; use Data::Dump qw/dump/; use DBI; my $dbh = DBI->connect( "dbi:CSV:", "", "", { f_schema => 'test', f_dir => './test', f_ext => '.csv', } ) or die "Could not open database: $DBI::errstr"; $dbh->{RaiseError} = 1; my $query = 'SELECT file1.vm_name, vm_cluster, vFiler_IP, vFiler_cluster FROM file +1 join file2 WHERE file1.vm_name = file2.vm_name'; my $result = $dbh->selectall_arrayref($query); print dump($result);
    Output:
    [ ["vm1", "fd1", "1.1.1.1", "fd4"], ["vm2", "fd2", "1.1.1.2", "fd3"], ["vm3", "fd3", "1.1.1.3", "fd2"], ["vm4", "fd4", "1.1.1.4", "fd1"], ]

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      CounterZero, This is **Exactly** what i was looking for. When I wrote my code for the 1st time and had nested "while" loops, I knew that this not the right way to do and there must be an easier way to do so. Thanks for the DBI solution.

      Thank you.

      This is an example that does not fully work: this is my improved Code ( Hope it will help somebody in the future) : use Data::Dump qw/dump/; use DBI; my $dbh = DBI->connect( "dbi:CSV:", "", "", { #f_schema => 'test', f_dir => '.', f_ext => '.csv', } ) or die "Could not open database: $DBI::errstr"; $dbh->{RaiseError} = 1; $dbh->{csv_tables}->{file1} = { file => 'vm.csv', }; $dbh->{csv_tables}->{file2} = { file => 'vfiler_ds.csv', }; my $query = "SELECT file1.vm_name, vm_cluster, vFiler_IP, vFiler_cluster FROM file1 join file2 WHERE file1.vm_name = file2.vm_name"; my $result = $dbh->selectall_arrayref($query); print dump($result);
        Now hopefully more readable This is an example that does not fully work: this is my improved Code ( Hope it will help somebody in the future) :
        use Data::Dump qw/dump/;
        use DBI;
        
        my $dbh = DBI->connect(
            "dbi:CSV:",
            "", "",
            {
        	    #f_schema => 'test',
                f_dir    => '.',
                f_ext    => '.csv',
            }
        ) or die "Could not open database: $DBI::errstr";
        $dbh->{RaiseError} = 1;
        
        
        $dbh->{csv_tables}->{file1} = {
        	file  => 'vm.csv',
        };
        
        $dbh->{csv_tables}->{file2} = {
        	file  => 'vfiler_ds.csv',
        };
        
        my $query = "SELECT file1.vm_name, vm_cluster, vFiler_IP, vFiler_cluster FROM file1 join file2 WHERE file1.vm_name = file2.vm_name";
        my $result = $dbh->selectall_arrayref($query);
        
        print dump($result);