http://www.perlmonks.org?node_id=1102279


in reply to Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?

It needs a few lines of Perl code only:
use Modern::Perl qw/2014/; use File::Find::Iterator; my $find = File::Find::Iterator->create( dir => ['d:/Perl/scripts'], filter => +\&find ); open my $FH_OUT, '>', './results.CSV' or die "Could not open results f +ile - $!"; while ( my $file = $find->next ) { open my $FH_IN, '<', $file or die "Could not open $file - $!"; say $FH_OUT join ', ', ( split /,/ )[ 0, 2 ] while (<$FH_IN>); } sub find { /GENES\d+\.csv/; }
I tested it with 1000 files of 1000 lines of 10 fields each: Extracting the first and third column and saving them in the results file took 47 seconds on my ASUS tablet with a 1.33 GHz Intel ATOM Z3740 (4 core) processor. I call that very efficient.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics
  • Comment on Re: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?
  • Download Code

Replies are listed 'Best First'.
Re^2: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?
by frozenwithjoy (Priest) on Sep 28, 2014 at 19:40 UTC
    This results in a file with 2 columns and 1,000,000 rows, right? I'm not entirely sure, but I think that OP wants the final file to have 1000 + 1 columns and 1000 rows. Maybe...
      Sorry for not writing a detailed description of what i am working on. So i have thousands of files in a tab delimited format saved in one folder. Each file has a format of 2000 by 10 table What i want to do right now is create new file with the data i want. So the format of each file follows ..
      Gene exp1 exp2 exp3 exp4 ... 1 1050 2020 100 100 2 100 100 100 100 3 224 11 11 11 4 11 15 555 444 5 22 51 55 555 6 55 55 55 555 ...
      From the first file i read, I want to extract two columns for example I want 'Gene' and 'exp4' columns and put it in a new file. And from the rest of the other files, I want to extract 'exp4' column only and add on the right side of the two columns i extracted from the first file. So the final format would look like
      Gene file1 exp4 file2 exp4 file3 exp4 file4 exp4.... 1 100 200 155 144 2 22 55 222 444 3 4 5 6 . .
      So it will have a 2000 by thousands(number of my files) table as a result. I am a beginner in programming and especially in perl. help me please..
        In that case, the second code snippet in my previous answer should do the trick if you are on Linux or OS X.