http://www.perlmonks.org?node_id=930673

bluray has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I was trying to match the first column in three files (each file has 6 columns) and get the matching results in the output file. In this case, the first five columns in the three matching files are the same, but there is a difference in the last column for the three files. So, I tried to concatanate the last columns from the input files also in the output file. But, I was stuck with an error "expected fields to be an array ref at line 74 <$FILE1> line 2. I tried to make it an array ref, then the output file first column has the following repeated (ARRAY(0x141ecf8) ARRAY(0x141edd0) ARRAY(0x141ee48)). There was nothing else in the other columns.

#!/usr/bin/perl -w use strict; use warnings; use Text::CSV_XS; open (my $FILE1, '<', "inputfile1.csv") or die "cannot open file1 $!\n +"; open (my $FILE2, '<', "inputfile2.csv") or die "cannot open file2 $!\n +"; open (my $FILE3, '<', "inputfile3.csv") or die "cannot open file3 $!\n +"; open (my $FILE4, '>', "Outputfile.csv") or die "cannot open file4 $!\n +"; my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ , sep_char => "\t", always_quote =>1}); my @columnheadings=split(/\t/,<$FILE1>); <$FILE2>; <$FILE3>; push (@columnheadings,('InFCDAC','InFCTriple')); my $headings=join("\t",@columnheadings); print $FILE4 "$headings\n"; my %file2; while (my $row = $csv->getline($FILE2)) { # chomp $row; my @fields = @$row; my $id = $fields[0]; $file2{$id}=["",@fields]; #print "$id\n"; } my %file3; while (my $row = $csv->getline($FILE3)) { # chomp $row; my @fields = @$row; my $id = $fields[0]; $file3{$id}=["",@fields]; #print "$id\n"; } while (my $row=$csv->getline($FILE1)) { my @fields=@$row; my $id=$fields[0]; if (exists $file2{$id}) { if(exists $file3{$id}) { my $fields_ref=\@fields; my @fields2=$file2{$id}[5]; my @fields3=$file3{$id}[5]; my @printline=$fields_ref."\t".\@fields2."\t".\@fields3; $csv->print ($FILE4,\@printline); } else { } } else { } }

Replies are listed 'Best First'.
Re: Expected fields to be an array ref
by armstd (Friar) on Oct 11, 2011 at 04:10 UTC

    Not knowing Text::CSV_XS other than browsing the perldoc real quick, I can mainly only comment on your code. The main thing I see is mixing lists and ARRAYrefs, and lots of expensive copying back and forth between the two. Here's the first problem:

    # Create ARRAYref to list of 7 elements, with first element "", # remaining 6 elements copied from @fields. $file2{$id}=["",@fields]; $file3{$id}=["",@fields]; ... # trying to access 6th element of $file2{$id} as a list, but it's # an ARRAYref to a list with 7 elements... my @fields2=$file2{$id}[5]; my @fields3=$file3{$id}[5];

    You are constructing $file2{$id} as an ARRAYref with [] (correctly). But you're not dereferencing it on access. Why add the "" at the beginning of the list? You're probably looking for something more like:

    # just hang onto the ARRAYref returned from $csv->getline(), # unless it doesn't want us to. $file2{$id}=$row; $file3{$id}=$row; ... # You really did want the 6th column, right? my $fields2=$file2{$id}->[5]; my $fields3=$file3{$id}->[5];

    Next, this is likely not what you expect:

    # Create list with one element as a string concat looking like # "ARRAY(0x141ecf8)\tARRAY(0x141edd0)\tARRAY(0x141ee48)" my @printline= ( $fields_ref."\t".\@fields2."\t".\@fields3; $csv->print ($FILE4,\@printline);

    You've got a string consisting of three ARRAYrefs joined by tabs. So that does match the errors you're seeing. It's probably looking for something simpler, a single reference to a list with all the data in it. Probably looking for something more like:

    # List with 8 elements my @printline= ( @fields, $file2{$id}->[5], $file3{$id}->[5] ); $csv->print ($FILE4,\@printline);

    Last, you need more error checking imo.

    Ultimately, I would recommend taking it a step further and just use references throughout, rather than copying lists to refs and back. It's just not necessary, and it causes your syntax to have to switch back and forth as well. References will keep your syntax consistent. There are lots of other ways to optimize this pretty significantly, but I'll leave that to you.

    --Dave