Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?

by Anonymous Monk
on Sep 28, 2014 at 22:30 UTC ( #1102293=note: print w/replies, xml ) Need Help??


in reply to Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?

Create 26 sample files, 10 columns, 2000 rows:

for my $letter ('A'..'Z') { my $file = "tmp/$letter.txt"; open my $fh, '>', $file or die "No open > $file: $!"; say $fh join "\t", 'Gene', map "exp$_", 1..10; for my $i (1..2000) { say $fh join "\t", $i, map "$letter-$i-exp$_", 1..10; } }

First lines of A.txt:

Gene exp1 exp2 exp3 exp4 exp5 exp6 exp7 exp8 + exp9 exp10 1 A-1-exp1 A-1-exp2 A-1-exp3 A-1-exp4 A-1-exp5 A-1-e +xp6 A-1-exp7 A-1-exp8 A-1-exp9 A-1-exp10 2 A-2-exp1 A-2-exp2 A-2-exp3 A-2-exp4 A-2-exp5 A-2-e +xp6 A-2-exp7 A-2-exp8 A-2-exp9 A-2-exp10 ...

Append exp4 column from each file to end of lines:

@ARGV = <tmp/*.txt>; my %row; while (<>) { my ($gene, $exp4) = (split /\t/)[0,4]; $row{$gene} .= "\t$exp4"; } delete $row{Gene}; say "$_$row{$_}" for sort {$a <=> $b} keys %row;

First lines of output:

1 A-1-exp4 B-1-exp4 C-1-exp4 D-1-exp4 E-1-exp4 F-1-e +xp4 G-1-exp4 H-1-exp4 I-1-exp4 J-1-exp4 K-1-exp4 L- +1-exp4 M-1-exp4 N-1-exp4 O-1-exp4 P-1-exp4 Q-1-exp4 + R-1-exp4 S-1-exp4 T-1-exp4 U-1-exp4 V-1-exp4 W-1-exp4 + X-1-exp4 Y-1-exp4 Z-1-exp4 2 A-2-exp4 B-2-exp4 C-2-exp4 D-2-exp4 E-2-exp4 F-2-e +xp4 G-2-exp4 H-2-exp4 I-2-exp4 J-2-exp4 K-2-exp4 L- +2-exp4 M-2-exp4 N-2-exp4 O-2-exp4 P-2-exp4 Q-2-exp4 + R-2-exp4 S-2-exp4 T-2-exp4 U-2-exp4 V-2-exp4 W-2-exp4 + X-2-exp4 Y-2-exp4 Z-2-exp4 ...
  • Comment on Re: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?
by coolda (Novice) on Sep 29, 2014 at 01:10 UTC
    this is great, may i ask what @ARGV and <> do in the second code? I googled it and i learned that empty diamond reads the @ARGV. So if you just set @ARGV = <*.txt> it reads any .txt file saved in that directory in order? If i want to skip the first line for every file, what should i do? I tried many things but it won't work. I usually used <$fh>; to read the first line and tried, next if $. <2 but neither worked.. Is there anyway you can skip the header(the first line) when using while(<>){} ???

      The Anonymous Monk deleted the header row in the code provided above.

      delete $row{Gene};

      That seems like the easiest way to do it. To do what you are asking here you can use eof. Also refer to Variables related to filehandles

        oh yes i noticed that, however my actual files has one more line above the row deleted by
        delete $row{Gene};
        So i need to skip the first row and then delete the Gene row.. thanks for the input, i'll look into the links
        i used next if /^/ ; it works well!! thanks,

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1102293]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2021-04-13 13:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?