http://www.perlmonks.org?node_id=956167

wanttoprogram has asked for the wisdom of the Perl Monks concerning the following question:

dear monks. I need to open 200 csv files and grab 3rd column in each file. I would like to write all 200 columns to a file for analysis.
datafile1: 1 has 0.334 2 has 0.986 3 has 0.787 .... 948 datafile2: 1 has 0.894 2 has 0.586 3 has 0.187 .... 948 there are 200 such files. my output should look like output: 1 has 0.334 0.894 .......... 200 columns 2 has 0.986 0.586 ...........200 columns 3 has 0.787 0.187 ............200 columns .... 948
can i open 200files in a loop rather than opening one by one. thank you. any help is appreciated

Original content restored above by GrandFather

syntax error at ./traj_pm line 26, near "$fds[" BEGIN not safe after errors--compilation I have this error ...can you please tell me how to get rid of this error

Replies are listed 'Best First'.
Re: opening many files
by Marshall (Canon) on Feb 26, 2012 at 02:33 UTC
    I would make an ArrayOfArray to keep the results. I would make a loop to open each file, read it line by line, use column 1 as the index to the ArrayOfArray and push col3 onto the appropriate array specified by the array index and then loop to the next file. Once you are done, the ArrayOfArray can be printed to a new file.

    What code have you written so far? What are the problems? A loop is the appropriate answer for repetitive operations like this. The data will fit into memory at once and only one file at a time needs to be open.

    Update: I think something like this would work:

    #!/usr/bin/perl -w use strict; use Data::Dumper; my $datafile1=<<END; 1 has 0.334 2 has 0.986 3 has 0.787 END my $datafile2=<<END; 1 has 0.894 2 has 0.586 3 has 0.187 END my @data; foreach my $fileRef (\$datafile1, \$datafile2) { open FILE, '<', $fileRef or die "$!"; while (<FILE>) { my ($row, $col3) = (split)[0,2]; push @{$data[--$row]}, $col3; } } my $row_num=1; foreach my $row (@data) { print $row_num++, " has ", "@$row\n"; } __END__ 1 has 0.334 0.894 2 has 0.986 0.586 3 has 0.787 0.187
    Of course instead of using references to files, you would need to use some form of glob() or readdir() to get the file names. And of course the data that was presented is not a CSV file so, something would have to be done about that.
Re: opening many files
by jwkrahn (Abbot) on Feb 26, 2012 at 03:51 UTC

    UNTESTED, but this may work:

    @ARGV = glob 'datafile*'; open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file' +because: $!"; while ( <> ) { if ( $. == 1 ) { print $OUTPUT $ARGV =~ /(\d+)$/, " has"; } print $OUTPUT " ", ( split )[ 2 ]; if ( eof ) { print $OUTPUT " $. columns\n"; close ARGV; } }
      thank you for reply. it is working file except i want 948 rows and 200 columns from 200 files. the program is giving me 200 rows and 948 columns. thank you.
        @ARGV = glob 'datafile*'; open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file' +because: $!"; my %data; while ( <> ) { next unless /^(.+) (\S+)$/; push @{ $data{ $1 } }, $2; } for my $key ( sort keys %data ) { print join ' ', $key, @{ $data{ $key } }, scalar @{ $data{ $key } +}, "columns\n"; }
      @ARGV = glob 'datafile*'; open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file' +because: $!"; my %data; while ( <> ) { next unless /^(.+) (\S+)$/; push @{ $data{ $1 } }, $2; } for my $key ( sort keys %data ) { print join ' ', $key, @{ $data{ $key } }, scalar @{ $data{ $key } +}, "columns\n"; }
      ======================== The program you have given works fantastic. But I have to deal with one more thing...Here it is. I have files by name datafile1,datafile2,datafile3,datafile4.........datafile200 in a directory. But these files are stored in a a different pattern ...datafile1,datafile10,datafile11,datafile12 and so on. The above program is reading and writing in the same order. I would like to pick data from in the an order from datafile1, 2, 3 to data200. I guess I need to sort. Any suggestions and additions are appreciated. Thank you.

        That's because glob returns the results of datafile* interpolated in the order a shell would return them, so datafile11 comes before datafile2, as you've seen. A quick and dirty solution, if you know you're only dealing with up to 3 digits:

        @ARGV = glob 'datafile? datafile?? datafile???';

        A more general solution that'll sort on any number of digits:

        @ARGV = sort { my $an = substr $a, 8; my $bn = substr $b, 8; $an <=> $ +bn } glob 'datafile*';

        Aaron B.
        Available for small or large Perl jobs; see my home node.

Re: opening many files
by aaron_baugher (Curate) on Feb 26, 2012 at 08:06 UTC

    Sure, you can open them in a loop. You could put your file descriptors in an array, and then loop through them as you print each line. As long as your system will let a process have that many open files, something like this should work (untested). It's not at all flexible, but since you seem to know that all your files have the same number of lines and the same format, it doesn't need to be. Also, if you use a CSV module, you may want to make an array of object references rather than simple file descriptors, but the looping concept would be the same.

    my @fds; for (1..200){ open my $fds[$_], '<', "datafile$_" or die $!; } for my $ln (1..948){ print "$ln has "; for my $fdn (1..200){ my $line = <$fds[$fdn]>; my $field3 = get_third_field_using_whatever_csv_method($line); print $field3; print ' ' unless $fdn == 200; } print "\n"; }

    Aaron B.
    My Woefully Neglected Blog, where I occasionally mention Perl.

      open my $fds[$_], '<', "datafile$_" or die $!;

      You can't use my on an array element.



      my $line = <$fds[$fdn]>;

      From I/O Operators:

      If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means "<$x>" is always a readline() from an indirect handle, but "<$hash{key}>" is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even "<$x >" (note the extra space) is treated as "glob("$x ")", not "readline($x)".

        Well, I did say it was untested, but that's a poor excuse. ++ for the correction; my 'my' should be removed on that line.

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

Re: opening many files
by Anonymous Monk on Feb 26, 2012 at 03:06 UTC
Re: opening many files
by umasuresh (Hermit) on Feb 26, 2012 at 02:27 UTC
    Hi wanttoprogram,
    Here is a good place to start learning:
    Perl tutorials
    Hint: use a Hash!
      Why on earth do you suggest using a hash? The files have an order, the OP wants to keep the order. It all screams array.