Re: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file?

Create 26 sample files, 10 columns, 2000 rows:

for my $letter ('A'..'Z') {
    my $file = "tmp/$letter.txt";
    open my $fh, '>', $file or die "No open > $file: $!";

    say $fh join "\t", 'Gene', map "exp$_", 1..10;

    for my $i (1..2000) {
        say $fh join "\t", $i, map "$letter-$i-exp$_", 1..10;
    }
}
[download]

First lines of A.txt:

Gene    exp1    exp2    exp3    exp4    exp5    exp6    exp7    exp8  
+  exp9    exp10
1    A-1-exp1    A-1-exp2    A-1-exp3    A-1-exp4    A-1-exp5    A-1-e
+xp6    A-1-exp7    A-1-exp8    A-1-exp9    A-1-exp10
2    A-2-exp1    A-2-exp2    A-2-exp3    A-2-exp4    A-2-exp5    A-2-e
+xp6    A-2-exp7    A-2-exp8    A-2-exp9    A-2-exp10
...
[download]

Append exp4 column from each file to end of lines:

@ARGV = <tmp/*.txt>;

my %row;

while (<>) {
    my ($gene, $exp4) = (split /\t/)[0,4];
    $row{$gene} .= "\t$exp4";
}

delete $row{Gene};

say "$_$row{$_}" for sort {$a <=> $b} keys %row;
[download]

First lines of output:

1    A-1-exp4    B-1-exp4    C-1-exp4    D-1-exp4    E-1-exp4    F-1-e
+xp4    G-1-exp4    H-1-exp4    I-1-exp4    J-1-exp4    K-1-exp4    L-
+1-exp4    M-1-exp4    N-1-exp4    O-1-exp4    P-1-exp4    Q-1-exp4   
+ R-1-exp4    S-1-exp4    T-1-exp4    U-1-exp4    V-1-exp4    W-1-exp4
+    X-1-exp4    Y-1-exp4    Z-1-exp4
2    A-2-exp4    B-2-exp4    C-2-exp4    D-2-exp4    E-2-exp4    F-2-e
+xp4    G-2-exp4    H-2-exp4    I-2-exp4    J-2-exp4    K-2-exp4    L-
+2-exp4    M-2-exp4    N-2-exp4    O-2-exp4    P-2-exp4    Q-2-exp4   
+ R-2-exp4    S-2-exp4    T-2-exp4    U-2-exp4    V-2-exp4    W-2-exp4
+    X-2-exp4    Y-2-exp4    Z-2-exp4
...
[download]

Comment on Re: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file? Select or Download Code

Replies are listed 'Best First'.
Re^2: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file? by coolda (Novice) on Sep 29, 2014 at 01:10 UTC
this is great, may i ask what @ARGV and <> do in the second code? I googled it and i learned that empty diamond reads the @ARGV. So if you just set @ARGV = <*.txt> it reads any .txt file saved in that directory in order? If i want to skip the first line for every file, what should i do? I tried many things but it won't work. I usually used <$fh>; to read the first line and tried, next if $. <2 but neither worked.. Is there anyway you can skip the header(the first line) when using while(<>){} ???	[reply]
Re^3: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file? by Lotus1 (Vicar) on Sep 29, 2014 at 03:47 UTC
The Anonymous Monk deleted the header row in the code provided above. `delete $row{Gene};` That seems like the easiest way to do it. To do what you are asking here you can use eof. Also refer to Variables related to filehandles	[reply] [d/l]
Re^4: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file? by coolda (Novice) on Sep 29, 2014 at 16:50 UTC
oh yes i noticed that, however my actual files has one more line above the row deleted by `delete $row{Gene};` [download] So i need to skip the first row and then delete the Gene row.. thanks for the input, i'll look into the links	[reply] [d/l]
Re^4: Is there any efficient way i can take out a specific column from hundreds of files and put it in one file? by coolda (Novice) on Sep 29, 2014 at 19:30 UTC
i used next if /^/ ; it works well!! thanks,	[reply]


Keep It Simple, Stupid
	PerlMonks